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Abstract 


Existing  approaches  for  providing  guaranteed  services  require  routers  to  manage  per  flow  states 
and  perform  per  flow  operations  [3,  13].  Such  a  stateful  network  architecture  is  less  scalable 
and  robust  than  stateless  network  architectures  like  the  original  IP  and  the  recently  proposed 
Diffserv  [12].  However,  services  provided  with  current  stateless  solutions,  Diffserv  included,  have 
lower  flexibility,  utilization,  and/or  assurance  level  as  compared  to  the  services  that  can  be 
provided  with  per  flow  mechanisms. 

In  this  paper,  we  propose  techniques  that  do  not  require  per  flow  management  (either  control 
or  data  planes)  at  core  routers,  but  can  implement  guaranteed  services  with  levels  of  flexibility, 
utilization,  and  assurance  similar  to  those  that  can  be  provided  with  per  flow  mechanisms.  In 
this  way  we  can  simultaneously  achieve  high  quality  of  service,  high  scalability  and  robustness. 
The  key  technique  we  use  is  called  Dynamic  Packet  State  (DPS),  which  provides  a  lightweight 
and  robust  mechanism  for  routers  to  coordinate  actions  and  implement  distributed  algorithms. 
We  present  an  implementation  of  the  proposed  algorithms  that  has  minimum  incompatibility 
with  IPv4. 


1  Introduction 


Current  IP  networks  provide  one  simple  service:  the  best-effort  datagram  delivery.  Such  a  simple 
service  model  allows  IP  routers  to  be  stateless:  except  routing  state,  which  is  highly  aggregated, 
routers  do  not  keep  any  other  fine  grain  information  about  traffic.  Providing  a  minimalist  service 
model  and  having  the  “stateless  waist”  in  the  protocol  hourglass  allows  the  Internet  to  scale  with 
both  the  size  of  the  network  and  heterogeneous  applications  and  technologies.  Together,  they 
are  two  of  the  most  important  technical  reasons  behind  the  success  of  the  Internet. 

As  the  Internet  evolves  into  a  global  communication  infrastructure,  there  is  a  growing  need 
to  support  more  sophisticated  services  (e.g.,  traffic  management,  QoS)  than  the  traditional  best- 
effort  service.  Two  classes  of  solutions  emerge:  those  maintaining  the  stateless  property  of  the 
original  IP  architecture,  and  those  requiring  a  new  stateful  architecture.  Examples  of  stateless 
solutions  are  RED  for  congestion  control  [15]  and  Differentiated  Service  (Diffserv)  [12]  for  QoS. 
The  corresponding  examples  of  stateful  solutions  are  Fair  Queueing  [11]  for  congestion  control 
and  Integrated  Service  (Intserv)  [3]  for  QoS.  In  general,  stateful  solutions  can  provide  more 
powerful  and  flexible  services.  For  example,  compared  with  RED,  Fair  Queueing  can  protect 
well-behaving  flows  from  misbehaving  ones  and  accommodate  heterogeneous  end-to-end  conges¬ 
tion  control  algorithms  [20,  25].  Similarly,  as  discussed  in  Section  2,  services  provided  by  Intserv 
solutions  have  higher  flexibility,  utilization,  and/or  assurance  level  than  those  provided  by  Diff¬ 
serv  solutions.  However,  as  also  discussed  in  Section  2,  stateful  solutions  are  less  scalable  and 
robust  than  their  stateless  counterparts. 

The  question  we  want  to  answer  is:  is  it  possible  to  have  the  best  of  the  two  worlds,  i.e.,  pro¬ 
viding  services  as  powerful  as  those  implemented  by  stateful  networks,  while  utilizing  algorithms 
as  scalable  and  robust  as  those  used  in  stateless  networks? 

While  we  cannot  answer  the  above  question  in  its  full  generality,  we  can  answer  it  in  some 
specific  cases  of  practical  interest.  We  consider  a  network  architecture  similar  to  the  Diffserv 
architecture,  called  Scalable  Core  or  SCORE,  in  which  only  edge  routers  perform  per  flow  man¬ 
agement,  while  core  routers  do  not.  As  illustrated  in  Figure  1,  the  goal  of  a  SCORE  network  is 
to  approximate  the  service  provided  by  a  reference  stateful  network.  In  [29]  we  have  shown  that 
a  SCORE  network  can  achieve  fair  bandwidth  allocation  by  approximating  the  service  provided 
by  a  reference  network  in  which  every  node  performs  fair  queueing. 

In  this  paper,  we  will  show  that  a  SCORE  network  can  provide  end-to-end  per  flow  delay  and 
bandwidth  guarantees  as  defined  in  Intserv.  Current  Intserv  solutions  assume  a  stateful  network 
in  which  two  types  of  per  flow  state  are  needed:  forwarding  state,  which  is  used  by  the  forwarding 
engine  to  ensure  fixed  path  forwarding,  and  QoS  state^,  which  is  used  by  both  the  admission 


^In  the  context  of  RSVP,  we  use  “QoS”  state  to  refer  to  both  the  flow  spec  and  the  filter  spec. 


(a)  Reference  Network 


Figure  1:  (a)  A  reference  stateful  network  whose  functionality  is  approximated  by  (b)  a  Scalable  Core 
(SCORE)  network.  In  SCORE  only  edge  nodes  perform  per  flow  management;  core  nodes  do  not 
perform  per  flow  management. 

control  module  in  the  control  plane  and  the  classifier  and  scheduler  in  the  data  plane.  In  [30],  we 
have  proposed  an  algorithm  that  implements  fixed  path  forwarding  with  no  per  flow  forwarding 
state.  In  this  paper,  we  focus  on  techniques  to  eliminate  the  need  for  core  nodes  to  keep  per  flow 
QoS  state.  In  particular,  we  propose  two  algorithms:  one  for  the  data  plane  to  schedule  packets, 
and  the  other  for  the  control  plane  to  perform  admission  control.  Neither  requires  per  flow  state 
at  core  routers. 

The  key  technique  used  to  implement  a.  SCORE  network  is  Dynamic  Packet  State  (DPS). 
With  DPS,  each  packet  carries  in  its  header  some  state  that  is  initialized  by  the  ingress  router. 
Core  routers  process  each  incoming  packet  based  on  the  state  carried  in  the  packet’s  header, 
updating  both  its  internal  state  and  the  state  in  the  packet’s  header  before  forwarding  it  to  the 
next  hop  (see  Figure  2).  By  using  DPS  to  coordinate  actions  of  edge  and  core  routers  along  the 
path  traversed  by  a  flow,  distributed  algorithms  can  be  designed  to  approximate  the  behavior  of 
a  broad  class  of  stateful  networks  using  networks  in  which  core  routers  do  not  maintain  per  flow 
state. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section  2,  we  give  an  overview  of  Intserv 
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Figure  2:  The  illustration  of  the  Dynamic  Packet  State  (DPS)  technique  used  to  implement  a  SCORE 
network:  (a-b)  upon  a  packet  arrival  the  ingress  node  inserts  some  state  into  the  packet  header;  (b-c)  a 
core  node  processes  the  packet  based  on  this  state,  and  eventually  updates  both  its  internal  state  and 
the  packet  state  before  forwarding  it.  (c-d)  the  egress  node  removes  the  state  from  the  packet  header. 

and  DifFserv,  and  discuss  the  tradeoffs  of  these  two  architectures  in  providing  QoS.  In  Sections  3 
and  4  we  present  the  details  of  our  data  and  control  path  algorithms,  respectively.  Section  5 
describes  a  design  and  a  prototype  implementation  of  the  proposed  algorithms  in  IPv4  networks. 
This  demonstrates  that  it  is  indeed  possible  to  implement  algorithms  with  Dynamic  Packet  State 
techniques  that  have  minimum  incompatibility  with  existing  protocols.  Finally,  we  conclude  the 
paper  in  Section  7. 

2  Intserv  and  Diffserv 

To  support  QoS  in  the  Internet,  the  IETF  has  defined  two  architectures:  the  Integrated  Services 
or  Intserv  [3],  and  the  Differentiated  Services  or  Diffserv  [12].  They  have  important  differences  in 
both  service  definitions  and  implementation  architectures.  At  the  service  definition  level,  Intserv 
provides  end-to-end  guaranteed  [26]  or  controlled  load  service  [37]  on  a  per  flow  (individual  or 
aggregate)  basis,  while  Diffserv  provides  a  coarser  level  of  service  differentiation  among  a  small 
number  of  traffic  classes.  At  the  implementation  level,  current  Intserv  solutions  require  each 
router  to  process  per  flow  signaling  messages  and  maintain  per  flow  data  forwarding  and  QoS  state 
on  the  control  path,  and  to  perform  per  flow  classification,  scheduling,  and  buffer  management 
on  the  data  path.  Performing  per  flow  management  inside  the  network  affects  both  the  network 
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scalability  and  robustness.  The  former  is  because  the  complexities  of  these  per  flow  operations 
usually  increase  as  a  function  of  the  number  of  flows;  the  later  is  because  it  is  difficult  to  maintain 
the  consistency  of  dynamic,  and  replicated  per  flow  state  in  a  distributed  network  environment. 
As  pointed  out  by  Clark  in  [5]:  “because  of  the  distributed  nature  of  the  replication,  algorithms  to 
ensure  robust  replication  are  themselves  difficult  to  build,  and  few  networks  with  distributed  state 
information  provide  any  sort  of  protection  against  failure.”  While  there  are  several  proposals  that 
aim  to  reduce  the  number  of  flows  inside  the  network  by  aggregating  micro-flows  that  follow  the 
same  path  into  one  macro-flow  [2,  18],  they  only  alleviate  this  problem,  but  do  not  fundamentally 
solve  it  the  number  of  macro  flows  can  still  be  quite  large  in  a  network  with  many  edge  routers, 
as  the  number  of  paths  is  a  quadratic  function  of  the  number  of  edge  nodes. 

Diffserv,  on  the  other  hand,  distinguishes  between  edge  and  core  routers.  While  edge  routers 
process  packets  on  the  basis  of  finer  traffic  granularity,  such  as  per  flow  or  per  organization, 
core  routers  do  not  maintain  fine  grain  state,  and  process  packets  based  on  a  small  number 
of  Per  Hop  Behaviors  (PHBs)  encoded  by  bit  patterns  in  the  packet  header.  By  pushing  the 
complexity  to  the  edge  and  maintaining  a  simple  core,  Diffserv’s  data  plane  is  much  more  scalable 
than  Intserv.  However,  Diffserv  still  needs  to  address  the  problem  of  admission  control  on  the 
control  path.  One  proposal  is  to  use  a  centralized  bandwidth  broker  that  maintains  the  topology 
as  well  as  the  state  of  all  nodes  in  the  network.  In  this  case,  the  admission  control  can  be 
implemented  by  the  broker,  eliminating  the  need  for  maintaining  distributed  reservation  state. 
Such  a  centralized  approach  is  more  appropriate  for  an  environment  where  most  flows  are  long 
lived,  and  set-up  and  tear-down  events  are  rare.  To  support  fine  grain  and  dynamic  flows,  there 
may  be  a  need  for  a  distributed  broker  architecture,  in  which  the  broker  database  is  replicated 
or  partitioned.  Distributed  broker  architectures  are  still  an  active  area  of  research.  One  can 
envision  an  architecture  in  which,  when  a  broker  receives  a  request,  it  makes  an  acceptance  or 
rejection  decision  based  on  its  own  database,  without  consulting  other  brokers.  This  eliminates 
the  need  for  a  signaling  protocol,  but  requires  another  protocol  to  maintain  the  consistency  of 
the  different  broker  databases.  However,  since  it  is  impossible  to  achieve  perfect  consistency, 
this  may  lead  to  race  conditions  and/or  resource  fragmentation.  In  particular,  since  requests 
which  arrive  simultaneously  at  different  brokers  may  want  to  reserve  capacity  along  the  same 
link,  each  broker  can  independently  allocate  only  a  fraction  of  the  link  capacity  without  running 
the  risk  of  over-provisioning.  This  translates  into  a  fundamental  trade-off  between  scalability 
and  fragmentation:  while  increasing  the  number  of  brokers  make  the  solution  more  scalable,  it 
also  increases  resource  fragmentation. 

While  Diffserv  is  more  scalable  thaji  Intserv  in  terms  of  implementation,  services  provided 
with  existing  Diffserv  solutions  usually  have  lower  flexibility,  utilization,  and  assurance  levels  than 
Intserv  services.  Two  examples  of  differentiated  service  models  are  the  assured  service  [6,  7]  and 
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the  premium  service  [22].  The  assured  service  is  a  form  of  statistical  service  and  achieves  lower 
assurance  than  guaranteed  service.  The  premium  service  provides  the  equivalent  of  a  dedicated 
link  of  fixed  bandwidth  between  two  edge  nodes.  However,  as  we  have  shown  in  Appendix  A, 
in  order  for  the  premium  service  to  achieve  service  assurance  comparable  to  the  guaranteed 
service,  even  with  a  relative  large  queueing  delay  bound  (e.g.,  200  ms),  the  fraction  of  bandwidth 
that  can  be  allocated  to  premium  service  traffic  has  to  be  very  low  (e.g.,  10%).  It  is  debatable 
whether  these  numbers  should  be  of  significant  concern.  For  example,  low  utilization  by  the 
premium  traffic  may  be  acceptable  if  the  majority  of  traffic  will  be  best  effort,  either  because 
the  best  effort  service  is  “good  enough”  for  most  applications  or  the  price  difference  between 
premium  traffic  and  best  effort  traffic  is  too  high  to  justify  the  performance  difference  between 
them.  Alternatively,  if  the  guaranteed  nature  of  service  assurance  is  not  needed,  i.e.,  statistical 
service  assurance  is  sufficient  for  premium  service,  higher  network  utilization  can  be  achieved. 
Providing  meaningful  statistical  service  is  still  an  open  research  problem.  A  discussion  of  these 
topics  is  beyond  the  scope  of  this  paper.  For  the  remaining  sections  of  the  paper,  we  assume  that 
it  is  a  desirable  goal  to  provide  guaranteed  service  and  at  the  same  time  achieve  high  resource 
utilization. 

In  summary,  Intserv  provides  more  powerful  service  but  has  serious  limitations  with  respect 
to  network  scalability  and  robustness.  Diffserv  is  more  scalable,  but  cannot  provide  services  that 
are  comparable  to  Intserv.  In  addition,  scalable  and  robust  admission  control  for  Diffserv  is  still 
an  open  research  problem. 

3  QoS  Scheduling  Without  Per  Flow  State 

Current  Intserv  solutions  assume  a  stateful  network  in  which  each  router  maintains  per  flow  QoS 
state.  The  state  is  used  by  both  the  admission  control  module  in  the  control  plane  and  the 
classifier  and  scheduler  in  the  data  plane. 

In  this  paper,  we  propose  scheduling  and  admission  control  algorithms  that  provide  guarantee 
services  but  do  not  require  core  routers  to  maintain  per  flow  state.  In  this  section,  we  present 
techniques  that  eliminate  the  need  for  data  plane  algorithms  to  use  per  flow  state  at  core  nodes. 
In  particular,  at  core  nodes,  packet  classification  is  no  longer  needed  and  packet  scheduling  is 
based  on  the  state  carried  in  packet  headers,  rather  than  per  flow  state  stored  locally  at  each 
node.  In  Section  4,  we  will  show  that  fully  distributed  admission  control  can  also  be  achieved 
without  the  need  for  maintaining  per  flow  state  at  core  nodes. 

The  main  idea  behind  our  solution  is  to  approximate  a  reference  stateful  network  with  a 
SCORE  network.  The  key  technique  used  to  implement  approximation  algorithms  is  Dynamic 
Packet  State  (DPS).  With  DPS,  each  packet  carries  some  state  which  is  initialized  by  the  ingress 
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node,  and  then  updated  by  core  nodes  along  the  packet’s  path.  The  state  is  used  by  nodes 
traversed  by  the  packet  to  coordinate  actions  and  implement  distributed  algorithms.  On  the 
data  path,  our  algorithm  aims  to  approximate  a  network  with  every  node  implementing  the 
Delay-Jitter-Controlled  Virtual  Clock  (Jitter- VC)  algorithm.  We  make  this  choice  for  several 
reasons.  First,  unlike  various  Fair  Queueing  algorithms  [11,  24],  in  which  a  packet’s  deadline  can 
depend  on  state  variables  of  all  active  flows,  in  Virtual  Clock  a  packet’s  deadline  depends  only  on 
the  state  variables  of  the  flow  it  belongs  to.  This  property  of  Virtual  Clock  makes  the  algorithm 
easier  to  approximate  in  a  SCORE  network.  In  particular,  the  fact  that  the  deadline  of  each 
packet  can  be  computed  exclusively  based  on  the  state  variables  of  the  flow  it  belongs  to,  makes 
possible  to  eliminate  the  need  of  replicating  and  maintaining  per  flow  state  at  all  nodes  across 
the  path.  Instead,  per  flow  state  can  be  stored  only  at  the  ingress  node,  inserted  into  the  packet 
header  by  the  ingress  node,  and  retrieved  later  by  core  nodes,  which  then  use  it  to  determine 
the  packet’s  deadline.  Second,  by  regulating  traffic  inside  network  using  delay-jitter-controllers 
(discussed  below),  it  can  be  shown  that  with  very  high  probability,  the  number  of  packets  in  the 
server  at  any  given  time  is  significantly  smaller  than  the  number  of  flows  (see  Section  3.3).  This 
helps  to  simplify  the  scheduler. 

In  the  remainder  of  this  section,  we  will  first  describe  the  implementation  of  Jitter-VC  using 
per  flow  state,  then  present  our  algorithm,  called  Core-Jitter-VC  (CJVC),  which  uses  the  tech¬ 
nique  of  Dynamic  Packet  State  (DPS).  In  Appendix  B  we  present  an  analysis  to  show  that  a 
network  of  routers  implementing  CJVC  provides  the  same  delay  bound  as  a.  network  of  routers 
implementing  the  Jitter-VC  algorithm. 


3.1  Jitter  Virtual  Clock  (Jitter-VC) 

Jitter-VC  is  a  non-work-conserving  version  of  the  Virtual  Clock  algorithm  [40].  It  uses  a  com¬ 
bination  of  a  delay-jitter  rate- controller  [33,  39]  and  a  Virtual  Clock  scheduler.  The  algorithm 
works  as  follows:  each  packet  is  assigned  an  eligible  time  and  a  deadline  upon  its  arrival.  The 
packet  is  held  in  the  rate-controller  until  it  becomes  eligible,  i.e.,  the  system  time  exceeds  the 
packet’s  eligible  time  (see  Figure  3(a)).  The  scheduler  then  orders  the  transmission  of  eligible 
packets  according  to  their  deadlines. 

For  the  packet  of  flow  i,  its  eligible  time  efj  and  deadline  df  ^  at  the  node  on  its  path 
are  computed  as  follows; 

ik 

Vi 
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i,j  >l,k>l 


(1) 

(2) 


Notation 

Comments 

Pi 

the  fc-th  packet  of  flow  i 

If 

length  of  pf 

<.i 

arrival  time  of  pf  at  node  j 

sending  time  of  pf  at  node  j 

eligible  time  of  at  node  j 

S  ■ 

S,.? 

deadline  of  pf  at  node  j 

st 

time  ahead  of  schedule:  gf  j  =  df  j  +  Tj  —  sfj 

sf 

slack  delay  of  pf 

^3 

propagation  delay  between  nodes  j  and  j  +  1 

^3 

transmission  time  of  a  maximum  size  packet  at  node  j 

Table  1:  Notations  used  in  Section  3. 

where  If  is  the  length  of  the  packet,  r,-  is  the  reserved  rate  for  the  flow,  af  j  is  the  packet’s  arrival 
time  at  the  node  traversed  by  the  packet,  and  gfj,  stamped  into  the  packet  header  by  the 
previous  node,  is  the  amount  of  time  the  packet  was  transmitted  before  its  schedule,  i.e.,  the 
difference  between  the  packet’s  deadline  and  its  actual  departure  time  at  the  j  -  I*'*  node.  Note 
that  the  packet  deadline  is  actually  inflated  by  rj,  i.e.,  the  transmission  time  of  a  packet  of 
maximum  size  between  nodes  j  and  j  1.  This  correction  is  needed  because  a  packet  can  miss 
its  deadline  by  Tj  [40]. 

Intuitively,  the  algorithm  eliminates  the  delay  variation  of  different  packets  by  forcing  all 
packets  to  incur  the  maximum  allowable  delay.  The  purpose  of  having  gfj-i  is  to  compensate  at 
node  j  the  variation  of  delay  due  to  load  fluctuation  at  the  previous  node  j  —  1.  Such  regulations 
limit  the  traffic  burstiness  caused  by  network  load  fluctuations,  and  as  a  consequence,  reduce 
both  buffer  space  requirements  and  the  scheduler  complexity. 

It  has  been  shown  that  if  a  flow’s  long  term  arrival  rate  is  no  greater  than  its  reserved  rate,  a 
network  of  Virtual  Clock  servers  can  provide  the  same  delay  guarantee  to  the  flow  as  a  network 
of  WFQ  servers  [14,  17,  28].  In  addition,  it  has  been  shown  that  a  network  of  Jitter-VC  servers 
can  provide  the  same  delay  guarantees  as  a  network  of  Virtual  Clock  servers  [10,  16].  Therefore, 
a  network  of  Jitter-VC  servers  can  provide  the  same  guaranteed  service  as  a  network  of  WFQ 
servers. 
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3.2  Core-Jitter- VC  (CJVC) 

In  this  section  we  propose  a  variant  of  Jitter-VC,  called  Core-Jitter-VC  (CJVC),  which  does  not 
require  per  flow  state  at  core  nodes.  In  addition,  we  show  that  a  network  of  CJVC  servers  can 
provide  the  same  guaranteed  service  as  a  network  of  Jitter-VC  servers. 

CJVC  uses  the  DPS  technique.  The  key  idea  is  to  have  the  ingress  node  to  encode  scheduling 
parameters  in  each  packet’s  header.  The  core  routers  can  then  make  scheduling  decisions  based 
on  the  parameters  encoded  in  packet  headers,  thus  eliminating  the  need  for  maintaining  per  flow 
state  at  core  nodes.  As  suggested  by  Eqs.  (1)  and  (2),  the  Jitter-VC  algorithm  needs  two  state 
variables  for  each  flow  i:  rj,  which  is  the  reserved  rate  for  flow  i  and  which  is  the  deadline 
of  the  last  packet  from  flow  i  that  was  served  by  node  j.  While  it  is  straightforward  to  eliminate 
putting  it  in  the  packet  header,  it  is  not  trivial  to  eliminate  df  The  difference  between  r,* 
and  j  is  that  while  all  nodes  along  the  path  keep  the  same  r,  value  for  flow  i,  df  -  is  a  dynamic 
value  that  is  computed  iteratively  at  each  node.  In  fact,  the  eligible  time  and  the  deadline  of  pf 
depend  on  the  deadline  of  the  previous  packet  of  the  same  flow,  i.e.,  d^~^. 

A  naive  implementation  using  the  DPS  technique  would  be  to  pre-compute  the  eligible  times 
and  the  deadlines  of  the  packet  at  all  nodes  along  its  path  and  insert  all  of  them  in  the  header. 
This  would  eliminate  the  need  for  core  nodes  to  maintain  dfj.  The  main  disadvantage  of  this 
approach  is  that  the  amount  of  information  carried  by  the  packet  increases  with  the  number  of 
hops  along  the  path.  The  challenge  then  is  to  design  algorithms  that  compute  dfj  for  all  nodes 
while  requiring  a  minimum  amount  of  state  in  the  packet  header. 

Notice  that  in  Eq.  (1),  the  reason  for  node  j  to  maintain  df  -  is  that  it  will  be  used  to  compute 
the  deadline  and  the  eligible  time  of  the  next  packet.  Since  it  is  only  used  in  a  max  operation, 
we  can  eliminate  the  need  for  d\  j  if  we  can  ensure  that  the  other  term  in  max  is  never  less  than 

^  slack  variable  associated  with  each  packet,  denoted  such 
that  for  every  core  node  j  along  the  path,  the  following  holds 

>4;'.  j>i  (3) 

By  replacing  the  first  term  of  max  in  Eq.  (1)  with  -j-  the  computation  of  the 

eligible  time  reduces  to 

’  J  >  1  (4) 

Therefore,  by  using  one  additional  DPS  variable  we  eliminate  the  need  for  maintaining  dfj  at 
the  core  nodes. 

The  derivation  of  proceeds  in  two  steps.  First,  we  express  the  eligible  time  of  packet  pf  at 
an  arbitrary  core  node  j,  efj,  as  a  function  of  the  eligible  time  of  pf  at  the  ingress  node  ef  ^  (see 
Bq-  (7))-  Second,  we  use  this  result  and  Ineq.  (4)  to  derive  a  lower  bound  for  S^. 
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Figure  3:  The  time  diagram  of  the  first  two  packets  of  flow  i  along  a  four  nodes  path  under  (a) 
Jitter-VC,  and  (b)  CJVC,  respectively.  Propagation  times  (ttj)  and  transmission  times  of  maximum 
size  packets  (rj)  are  ignored. 


We  now  proceed  with  the  first  step.  Recall  that  represents  the  time  by  which  pf 

is  transmitted  before  its  schedule  at  node  j  —  1,  i.e.,  dij-i  +  Tj-i  —  where  tj-i  is  the 

maximum  time  by  which  a  packet  can  miss  its  deadline  at  node  j  —  1.  Let  tt^-i  denote  the 
propagation  delay  between  nodes  j  —  1  and  j.  Then  the  arrival  time  of  pf  at  node  j,  afj,  is  given 

by 


ufj  =  +  TTj-i  +  Tj_i  (5) 

=  ^ij-1  -  9i,j-l  +  TTj-l  +  Tj-i. 

By  replacing  a^j,  given  by  the  above  expression,  in  Eq.  (4),  and  then  using  Eq.  (2),  we  obtain 

+  Ti-i  (b) 

=  +  ^i-i  +  Tj-i- 

f  i 

By  iterating  over  the  above  equation  we  express  efj  as  a  function  of 

efj  =  4,1  +  (i  -  1)  (7  +  4)  +  I]  (TTm  +  Tm),  j  >  1  (7) 

We  are  now  ready  to  compute  Sf.  Recall  that  the  goal  is  to  compute  the  minimum  Sf  which 
ensures  that  Ineq.  (3)  holds  for  every  node  along  the  path.  After  combining  Ineq.  (3),  Eq.  (4) 
and  Eq.  (2)  this  reduces  to  ensure  that 

4. > 4j‘ ^ > el- +  J>1  (8) 

'  I 

By  plugging  and  as  expressed  by  Eq.  (7)  into  Ineq.  (8),  we  get 
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(9) 


y  ^k-\  , 


/f-^  -  /f 

+  J - L  + 


pfe-1 

S,1 


+  V^t  -  ef,i 


(i  - 1) 


j  >  1 


From  Eqs.  (1)  and  (2)  we  have  efi  >  Thus,  the  right-hand  side  term  in 

Ineq.  (9)  is  maximized  when  j  =  h.  As  a.  result  we  compute  df  as 


i‘  =  0, 

Sf  =  maxfo,4^~^-f 


ik—l 


I’! 


k-\ 


(10) 


h-l 


k>  l,h>  l. 


In  this  way,  CJVC  ensures  that  the  eligible  time  of  every  packet  pf  at  node  j  is  no  smaller 
than  the  deadline  of  the  previous  packet  of  the  same  flow  at  node  j,  i.e.,  .  In  addition, 

the  Virtual  Clock  scheduler  ensures  that  the  deadline  of  every  packet  is  not  missed  by  more  than 
Tj  [40]. 

In  Appendix  B,  we  have  shown  that  a  network  of  CJVC  servers  provide  the  same  worst  case 
delay  bounds  as  a  network  of  Jitter-VC  servers.  More  precisely,  we  have  proven  the  following 
result. 


Theorem  1  The  deadline  of  a  packet  at  the  last  hop  in  a  network  of  CJVC  servers  is  equal  to 
the  deadline  of  the  same  packet  in  a  corresponding  network  of  Jitter-VC  servers. 

The  example  in  Figure  3  provides  some  intuition  behind  the  above  result.  The  basic  obser¬ 
vation  is  that,  with  Jitter-VC,  not  counting  the  propagation  delay,  the  difference  between  the 
eligible  time  of  packet  pf  at  node  j  and  its  deadline  at  the  previous  node  7  —  1,  i.e.,  ■  —  dk  ■ 

never  decreases  as  the  packet  propagates  along  the  path.  Consider  the  second  packet  in  Figure  3. 
With  Jitter-VC,  the  differences  e?j  —  (represented  by  the  bases  of  the  gray  triangles)  in¬ 
crease  in  j.  By  introducing  the  slack  variable  CJVC  equalizes  these  delays.  While  this  change 
may  increase  the  delay  of  the  packet  at  intermediate  hops,  it  does  not  affect  the  end-to-end  delay 
bound. 

Figure  4  shows  the  computation  of  the  scheduling  parameters  ef j  and  dfj  by  a  CJVC  server. 
The  number  of  hops  h  is  computed  at  the  admission  time  as  discussed  in  Section  4.1. 

3.3  Data  Path  Complexity 

While  our  algorithms  do  not  maintain  per  flow  state  at  core  nodes,  there  is  still  the  need  for  core 
nodes  to  perform  regulation  and  packet  scheduling  based  on  eligible  times  and  deadlines.  The 
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ingress  node  _  _ _ 

on  packet  p  arrival 
i-g€tJlow{p); 
if  (first_packet_ofJlow(p,  i)) 
a  =  current  Jime] 

Si  =  0; 

else 

Si  =  max(0,  Si  +  {li  -  length{p))/ri- 

mdix{currentJim€  —  di^O)/{h—l))]  /*  Eq.  (10)  */ 
a  =  max(c'wrrenf  Jime, 

I-  =  length{p); 
di  =  ei  +  /i/n; 
on  packet  p  transmission 

lahel{p)  ^  {ri^di  -  current  dime  ^  Si)-, 
core/egress  node 
on  packet  p  arrival 
{r,g,S)  ^  label {p); 

e  =  currentJime  +  g  S;  /*  Eq.  (4)  */ 
d=z  €  +  length{p)lr 
on  packet  p  transmission 
if  (core  node) 

label (p)  (r,  d  —  currentJime^  5); 

else  /*  this  is  an  egress  node  */ 

.  clear  Jabel{p); 


Figure  4:  Algorithms  performed  by  ingress,  core,  and  egress  nodes  at  the  packet  arrival  and  departure 
Note  that  core  and  egress  nodes  do  not  maintain  per  flow  state. 
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natural  question  to  ask  is:  why  is  this  a  more  scalable  scheme  than  previous  solutions  requiring 
per  flow  management? 

There  are  several  scalability  bottlenecks  for  solutions  requiring  per  flow  management.  On 
the  data  path,  the  expensive  operations  are  per  flow  classification  and  scheduling.  On  the 
control  path,  the  complexity  is  the  maintenance  of  consistent  and  dynamic  state  in  a  distributed 
environment.  Among  the  three,  it  is  easiest  to  reduce  the  complexity  of  the  scheduling  algorithm 
as  there  is  a  natural  tradeoff  between  the  complexity  and  the  flexibility  of  the  scheduler  [35].  In 
fact,  a  number  of  techniques  have  already  been  proposed  to  reduce  the  scheduling  complexity, 
including  those  requiring  constant  time  complexity  [27,  36,  38]. 

We  also  note  that  due  to  the  way  we  regulate  ti’affic,  it  can  be  shown  that  with  very  high 
probability,  the  number  of  packets  in  the  server  at  any  given  time  is  significantly  smaller  than 
the  number  of  flows.  This  will  further  reduce  the  scheduling  complexity  and  in  addition  reduce 
the  buffer  space  requirement.  More  precisely,  in  Appendix  C  we  prove  the  following  result. 

Theorem  2  Consider  a  server  traversed  by  n  flows.  Assume  that  the  arrival  times  of  the  ■packets 
from  different  flows  are  independent,  and  that  all  packets  have  the  same  size.  Then,  for  any  given 
probability  e,  the  queue  size  at  any  time  instant  during  a  server  busy  periodic  is  asymptotically 
bounded  above  by  s,  where 


^  /In  n 

Ine  \ 

f”(  2  ■ 

— -T 

(11) 

with  a  probability  larger  than  1  —  e.  For  identical  reservations  /?  =  1;  for  heterogeneous  reserva¬ 
tions  /?  =  3. 

As  an  example,  let  n  =  10®,  and  e  =  10“^®,  which  is  the  same  order  of  magnitude  as  the 
probability  of  a  packet  being  corrupted  at  the  physical  layer.  Then,  by  Eq.  (11)  we  obtain 
s  =  4174  if  all  flows  have  identical  reservations,  and  s  =  7230  if  flows  have  heterogeneous 
reservations.  Thus  the  probability  of  having  more  packets  in  the  queue  than  specified  by  Eq.  (11) 
can  be  neglected  at  the  level  of  the  entire  system  even  in  the  context  of  guaranteed  services. 

In  Table  2  we  compare  the  bounds  given  by  Eq.  (11)  to  simulation  results.  In  each  case  we 
report  the  maximum  queue  size  achieved  during  the  first  n  time  slots  of  a  busy  period  over  10® 
independent  trials.  We  note  that  in  the  case  of  all  flows  having  identical  reservations  we  are 
guaranteed  that  if  the  queue  does  not  overflow  during  the  first  n  time  slots  of  a  busy  period,  it 
will  not  overflow  during  the  rest  of  the  busy  period  (see  Corollary  1).  Since  the  probability  of 
a  buffer  to  overflow  during  the  first  n  time  slots  is  no  larger  than  n  times  the  probability  of  the 
buffer  to  overflow  during  an  arbitrary  time  slot,  we  use  e  =  to  compute  the  corresponding 
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flows  (n) 

bound  (5) 

max.  queue  size 

100 

53 

30 

1,000 

188 

95 

10,000 

648 

309 

100,000 

2210 

904 

1,000,000 

7465 

2944 

#  flows  (n) 

bound  (s) 

max.  queue  size 

100 

31 

28 

1,000 

109 

0 

0 

10,000 

374 

284 

100,000 

1276 

880 

1,000,000 

4310 

2900 

(a)  (b) 

Table  2;  The  upper  bound  of  the  queue  size,  s,  computed  by  Eq.  (11)  for  e  =  (where  n  is  the 
number  of  flows)  versus  the  maximum  queue  size  achieved  during  the  first  n  time  slots  of  a  busy  period 
over  10®  independent  trials,  during  the  first  n  time  slots  of  a  busy  period:  (a)  when  all  flows  have 
identical  reservations;  (b)  when  flows’  reservations  differ  by  a  factor  of  20. 

bounds.^ 

The  results  show  that  our  bounds  are  reasonably  close  (within  a  factor  of  two)  when  all 
reservations  are  identical,  but  are  more  conservative  when  the  reservations  are  diflferent.  Finally, 
we  make  three  comments.  First,  by  performing  per  packet  regulation  at  every  core  node,  the 
bounds  given  by  Eq.  (11)  hold  for  any  core  node  and  are  independent  of  the  path  length.  Second, 
if  the  flows’  arrival  patterns  are  not  independent,  we  can  easily  enforce  this  by  randomly  delaying 
the  first  packet  from  each  backlogged  period  of  the  flow  at  ingress  nodes.  This  will  increase  the 
end-to-end  packet  delay  by  at  most  the  queueing  delay  of  one  extra  hop.  Third,  the  bounds 
given  by  Eq.  (11)  are  asymptotic.  In  particular,  in  proving  the  results  in  Appendix  C  we  make 
the  assumption  that  n  ^  s.  However,  this  a  reasonable  assumption  in  practice,  as  the  most 
interesting  cases  involve  high  values  for  n,  and,  as  suggested  by  Eq.  (11)  and  the  results  in 
Table  2,  even  for  small  values  of  e  (e.g.,  10“^°),  n  is  much  larger  than  s  in  these  case. 

4  Admission  Control  With  No  Per  Flow  State 

A  key  component  of  any  architecture  that  provides  guaranteed  services  is  the  admission  con¬ 
trol.  The  main  job  of  the  admission  control  is  to  ensure  that  the  network  resources  are  not 
over-committed.  In  particular  it  has  to  ensure  that  the  sum  of  the  reservation  rates  of  all  flows 
that  traverse  any  link  in  the  network  is  no  larger  than  the  link  capacity,  i.e.,  <  C.  A 

new  reservation  request  is  granted  if  it  passes  the  admission  test  at  each  hop  along  its  path.  As 
discussed  in  Section  2,  implementing  such  a  functionality  is  not  trivial:  traditional  distributed 

^More  formally,  let  e'  be  the  probability  that  the  buffer  does  not  overflow  during  the  first  n  time  slots  of  the 
busy  period.  Then  by  taking  e'  =  n  ■  e,  Eq.  (11)  becomes  s  =  \/l3n{\nn  —  (lne')/2  —  1). 
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Figure  5:  Ingress-egress  admission  control  when  RSVP  is  used  outside  the  SCORE  domain. 


architectures  based  on  signaling  protocols  are  not  scalable  and  are  less  robust  due  to  the  require¬ 
ment  of  maintaining  dynamic  and  replicated  state;  centralized  architectures  have  scalability  and 
availability  concerns. 

In  this  section,  we  propose  a  fully  distributed  architecture  for  implementing  admission  control. 
Like  most  distributed  admission  control  architectures,  in  our  solution,  each  node  keeps  track  of 
the  aggregate  reservation  rate  for  each  of  its  out-going  links  and  makes  local  admission  control 
decisions.  However,  unlike  existing  reservation  protocols,  this  distributed  admission  control 
process  is  achieved  without  core  nodes  maintaining  per  flow  state. 

4.1  Ingress-to-Egress  Admission  Control 

We  consider  an  architecture  in  which  a  lightweight  signaling  protocol  is  used  within  the  SCORE 
domain.  Edge  routers  are  the  interface  between  this  signaling  protocol  and  an  inter-domain 
signaling  protocol  such  as  RSVR  For  the  purpose  of  this  discussion,  we  consider  only  unicast 
reservations.  In  addition,  we  assume  a  mechanism  like  the  one  proposed  in  [30]  or  Multi-Protocol 
Label  Switching  (MPLS)  [4]  that  can  be  used  to  pin  a  flow  to  a  route. 

From  the  point  of  view  of  RSVP,  a  path  through  the  SCORE  domain  is  just  a  virtual  link. 
There  are  two  basic  control  messages  in  RSVP:  Path  and  Resv.  These  messages  are  processed 
only  by  edge  nodes;  no  operations  are  performed  inside  the  domain.  For  the  ingress  node,  upon 
receiving  a  Path  message,  it  simply  forwards  it  through  the  domain.  For  the  egress  node,  upon 
receiving  the  first  Resv  message  for  a  flow  (i.e.,  there  was  no  RSVP  state  for  the  flow  at  the 
egress  node  before  receiving  the  message),  it  will  forward  the  message  (message  “1”  in  Figure  5) 
to  the  corresponding  ingress  node,  which  in  turn  will  send  a  special  signaling  message  (message 
“2”  in  Figure  5)  along  the  path  toward  the  egress  node.  Upon  receiving  the  signaling  message, 
each  node  along  the  path  performs  a  local  admission  control  test  as  described  in  Section  4.2.  In 
addition,  the  message  carries  a  counter  h  that  is  incremented  at  each  hop.  The  final  value  h 
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is  used  for  computing  the  slack  delay  8  (see  Eq.  (10)).  If  we  use  the  route  pinning  mechanism 
described  in  [30],  message  “2”  is  also  used  to  compute  the  label  of  the  path  between  the  ingress 
and  egress.  This  label  is  used  then  by  the  ingress  node  to  make  sure  that  all  data  packets  of 
the  flow  are  forwarded  along  the  same  path.  When  the  signaling  message  2  reaches  the  egress 
node,  it  is  reflected  back  to  the  sender,  which  makes  the  final  decision  (message  “3”  in  Figure  5). 
RSVP  refresh  messages  for  a  flow  that  already  has  per  flow  RSVP  state  installed  at  edge  routers 
will  not  trigger  additional  signaling  messages  inside  the  domain. 

Since  RSVP  uses  raw  IP  or  UDP  to  send  control  messages,  there  is  no  need  for  retransmission 
for  our  signaling  messages,  as  message  loss  will  not  break  the  RSVP  semantics.  If  the  sender 
does  not  receive  a  reply  after  a  certain  timeout,  it  simply  drops  the  Resv  message.  In  addition,  as 
we' will  show  in  Section  4.3,  there  is  no  need  for  a  special  termination  message  inside  the  domain 
when  a  flow  is  torn  down. 

4.2  Per-Hop  Admission  Control 

Each  node  needs  to  ensure  that  Ei  H  <  C*  holds  at  all  times.  At  first  sight,  one  simple  solution 
that  implements  this  test  and  also  avoids  per  flow  state  is  for  each  node  to  maintain  the  aggregate 
reserved  rate  R,  where  R  is  updated  to  R  =  R  +  r  when  a  new  flow  with  the  reservation  rate 
r  is  admitted,  and  to  R  =  R  —  t’  when  a  flow  with  the  reservation  rate  r  terminates.  The 
admission  control  reduces  then  to  checking  whether  R  +  r  <  C  holds.  However,  it  can  be  easily 
shown  that  such  a  simple  solution  is  not  robust  with  respect  to  various  failure  conditions  such  as 
packet  loss,  partial  reservation  failures,  and  network  node  crashes.  To  handle  packet  loss,  when 
a  node  receives  a  set-iip  or  tear-down  message,  the  node  has  to  be  able  to  tell  whether  it  is  a 
duplicate  of  a  message  already  processed.  To  handle  partial  reservation  failures,  a  node  needs 
to  “remember”  what  decision  it  made  for  the  flow  in  a  previous  pass.  That  is  why  all  existing 
solutions  maintain  per  flow  reservation  state,  be  it  hard  sta.te  as  in  ATM  UNI  or  soft  state  as 
in  RSVP.  However,  maintaining  consistent  and  dynamic  state  in  a  distributed  environment  is 
itself  challenging.  Fundamentally,  this  is  due  to  the  fact  that  the  update  operations  assume  a 
transaction  semantic,  which  is  difficult  to  implement  in  a  distributed  environment  [1,  34]. 

In  the  remaining  of  the  section,  we  show  that  by  using  DPS,  it  is  possible  to  significantly 
reduce  the  complexity  of  admission  control  in  a  distributed  environment.  Before  we  present  the 
details  of  the  algorithm,  we  point  out  that  our  goal  is  to  estimate  a  close  upper  bound  on  the 
aggregate  reserved  rate.  By  using  this  bound  in  the  admission  test  we  avoid  over-provisioning, 
which  is  a  necessary  condition  to  provide  deterministic  service  guarantees.  This  is  in  contrast 
to  many  measurement-based  admission  control  algorithms  [19,  32],  which,  in  the  context  of 
supporting  controlled  load  or  statistical  services,  base  their  admission  test  on  the  measurement  of 
the  actual  amount  of  traffic  transmitted.  To  achieve  this  goal,  our  algorithm  uses  two  techniques. 


15 


Notation 

Comments 

ri 

flow  i’s  reserved  rate 

bf 

total  number  of  bits  flow  i  is  entitled  to  transmit 
during  i.e.,  bf  =  ri(4i  - 

m 

aggregate  reservation  at  time  t 

Rboundi^') 

upper  bound  of  R{t)^  used  by  admission  test 

Rupsit) 

estimate  of  R{t)^  computed  by  using  DPS 

Rnew  (0 

sum  of  all  new  reservations  accepted  from  the 
beginning  of  current  estimation  interval  until  t 

Rcal{t) 

upper  bound  of  R{t),  used  to  calibrate  Rbound, 
computed  based  on  Rdps  and  Rn^w 

Table  3:  Notations  used  in  Section  4.3. 


First,  a  conservative  upper  bound  of  R,  denoted  Rbound,  is  maintained  at  each  core  node  and 
is  used  for  making  admission  control  decisions.  Rbcund  is  updated  with  a  simple  rule:  R^ound  = 
Rbound  +  r  whenever  a  new  request  of  a  rate  r  is  accepted.  It  should  be  noted  that  in  order 
to  maintain  the  invariant  that  Rbound  is  an  upper  bound  of  i?,  this  algorithm  does  not  need  to 
detect  duplicate  request  messages,  generated  either  due  to  retransmission  in  case  of  packet  loss 
or  retry  in  case  of  partial  reservation  failures.  Of  course,  the  obvious  problem  with  this  algorithm 
is  that  Rbound  will  diverge  from  R.  In  the  limit,  when  Rbound  reaches  the  link  capacity  (7,  no  new 
requests  can  be  accepted  even  though  there  might  be  available  capacity. 

To  address  this  problem,  a  separate  algorithm  is  introduced  to  periodically  estimate  the 
aggregate  reserved  rate.  Based  on  this  estimate,  a  second  upper  bound  for  R,  denoted  R^au 
is  computed  and  used  to  re-calibrate  Rbound-  An  important  aspect  of  the  estimation  algorithm 
is  that  the  discrepancy  between  the  upper  bound  Roai  and  the  actual  reserved  rate  R  can  be 
bounded.  The  re-calibration  then  becomes  choosing  the  minimum  of  the  two  upper  bounds 
Rbound  and  Real-  The  estimation  algorithm  is  based  on  DPS  and  does  not  require  core  routers  to 
maintain  per  flow  state. 

Our  algorithms  have  several  important  properties.  First,  they  are  robust  in  the  presence  of 
network  losses  and  partial  reservation  failures.  Second,  while  they  can  over-estimate  R,  they 
will  never  under-estimate  R.  This  ensures  the  semantics  of  the  guaranteed  service  -  while  over¬ 
estimation  can  lead  to  under-utilization  of  network  resources,  under-estimation  can  result  in 
over-provisioning  and  violation  of  performance  guarantees.  Finally,  the  proposed  estimation 
algorithms  are  self-correcting  in  the  sense  that  over-estimation  in  a  previous  period  will  be 
corrected  in  the  next  period.  This  greatly  reduces  the  possibility  of  serious  resource  under- 
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utilization. 


4.3  Aggregate  Reservation  Estimation  Algorithm 

In  this  section,  we  present  the  estimation  algorithm  of  the  aggregate  reserved  rate  which  is 
performed  at  each  core  node.  In  particular,  we  will  describe  how  Real  is  computed  and  how  it  is 
used  to  re-calibrate  Rbound-  In  designing  the  algorithm  for  computing  Real,  we  want  to  balance 
between  two  goals:  (a)  Real  should  be  an  upper  bound  on  R;  (b)  over-estimation  errors  should 
be  corrected  and  kept  to  the  minimum. 

To  compute  Real,  we  start  with  an  inaccurate  estimate  of  R,  denoted  Rdps,  nnd  then  make 
adjustments  to  account  for  estimation  inaccuracies.  In  the  following,  we  first  present  the  al¬ 
gorithm  that  computes  Rdps,  then  describe  the  possible  inaccuracies  and  the  corresponding 
adjustment  algorithms. 

The  estimate  Rdps  is  calculated  using  the  DPS  technique:  ingress  nodes  insert  additional 
state  in  packet  headers,  which  is  in  turn  used  by  core  nodes  to  estimate  the  aggregate  reservation 
R.  In  particular,  the  following  state  is  inserted  in  the  header  of  packet  pf : 


,fc- 


), 


(12) 


where  and  are  the  times  the  packets  and  pf  are  transmitted  by  the  ingress  node. 
Therefore,  bf  represents  the  total  amount  of  bits  that  flow  i  is  entitled  to  send  during  the  interval 
The  computation  of  Rdps  is  based  on  the  following  simple  observation:  the  sum  of  b 
values  of  all  packets  of  flow  i  during  an  interval  is  a  good  approximation  for  the  total  number  of 
bits  that  flow  i  is  entitled  to  send  during  that  interval  according  to  its  reserved  rate.  Similarly,  the 
sum  of  b  values  of  all  packets  is  a  good  approximation  for  the  total  number  of  bits  that  all  flows 
are  entitled  to  send  during  the  corresponding  interval.  Dividing  this  sum  by  the  length  of  the 
interval  gives  the  aggregate  reservation  rate.  More  precisely,  let  us  divide  time  into  intervals  of 
length  Tw'-  {uk,Uk+i\,  k  >  0.  Let  bi{uk,uk+i)  be  the  sum  of  b  values  of  packets  in  flow  i  received 
during  {uk,Uk+i],  and  let  B{uk,  Ufc+i)  be  the  sum  of  b  values  of  all  packets  during  {uk,Uk+i\.  The 
estimate  is  then  computed  at  the  end  of  each  interval  {uk,Uk+i]  as  follows 


RDPs{uk+i)  = 


B{uk,Uk+i) 

^/c+1 


Tw 


(13) 


While  simple,  the  above  algorithm  may  introduce  two  types  of  inaccuracies.  First,  it  ignores 
the  effects  of  the  delay  jitter  and  the  packet  inter-departure  times.  Second,  it  does  not  consider 
the  effects  of  accepting  or  terminating  a  reservation  in  the  middle  of  an  estimation  interval.  In 
particular,  having  newly  accepted  flows  in  the  interval  may  result  in  the  under-estimation  of  R{t) 


17 


Figure  6:  The  scenario  in  which  the  lower  bound  of  bi,  i.e.,  ri(T\/\r  —  Tj  —  Tj),  is  achieved.  The  arrows 
represent  packet  transmissions.  Tw  is  the  averaging  window  size;  Tj  is  an  upper  bound  on  the  packet 
inter-departure  time;  Tj  is  an  upper  bound  on  the  delay  jitter.  Both  ml  and  m2  miss  the  estimation 
interval  Tw- 

by  Ropsit)-  To  illustrate  this,  consider  the  following  simple  example:  there  are  no  guaranteed 
flows  on  a  link  until  a  new  request  with  rate  r  is  accepted  at  the  end  of  an  estimation  interval 
iuk,Uk+i].  If  no  data  packet  from  the  new  flow  reaches  the  node  before  Uk+i,  B{uk,Uk+i)  would 
be  0,  and  so  would  be  RDPs{uk+i)-  However,  the  correct  value  should  be  r. 

In  the  following,  we  present  the  algorithm  to  compute  an  upper  bound  of  R{uk+i),  denoted 
Rcaiiuk+i)-  In  doing  this  we  account  for  both  types  of  inaccuracies.  Let  C(t)  denote  the  set  of 
reservations  at  time  t.  Our  goal  is  then  to  bound  the  aggregate  reservation  at  time  i.e., 
R{uk+i)  =  Y!,ieC{uk+i)'f'i-  Consider  the  division  of  C{uk+i)  into  two  subsets:  the  subset  of  new 
reservations  that  were  accepted  during  the  interval  denoted  Af{uk+i),  and  the  subset 

containing  the  rest  of  reservations  which  were  accepted  no  later  than  u^+i.  Next,  we  express 
R{uk+i)  as 


R{Uk+l)=  Y,  '^i+  Y  (14) 

The  idea  is  then  to  derive  an  upper  bound  for  each  of  the  two  right-hand  side  terms,  and  compute 
Real  as  the  sum  of  these  two  bounds.  To  bound  J2iec{uk^i)\Af(uk^i)  we  note  that 

B{uk,Uk+i)>  Y  bi(uk,Uk+i).  (15) 


The  reason  that  (15)  is  an  inequality  instead  of  an  equality  is  that  when  there  are  flows  ter¬ 
minating  during  the  interval  {uk,Uk+i],  their  packets  may  still  have  contributed  to  B{uk.,Uk+i) 
even  though  they  do  not  belong  to  C{uk+i)  \  V(ufc^i).  Next,  we  compute  a  lower  bound  for 
t>i{uk,Uk+i).  By  definition,  since  i  G  C{uk^i)  \  Af{uk+i),  it  follows  that  flow  i  holds  a  reservation 
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during  the  entire  interval  (ufc,  Let  Tj  be  the  maximum  inter-departure  time  between  two 

consecutive  packets  of  a  flow  at  the  edge  node,  and  let  Tj  be  the  maximum  delay  jitter  of  a  flow, 
where  both  Tj  and  Tj  are  much  smaller  than  Tw-  Now,  consider  the  scenario  shown  in  Figure  6 
in  which  a  core  node  receives  the  packets  ml  and  m2  just  outside  the  estimation  window.  As¬ 
suming  the  worst  case  in  which  ml  incurs  the  lowest  possible  delay,  m2  incurs  the  maximum 
possible  delay,  and  that  the  last  packet  before  m2  departs  T[  seconds  earlier,  it  is  easy  to  see 
that  that  the  sum  of  the  b  values  carried  by  the  packets  received  during  the  estimation  interval 
by  the  core  node  cannot  be  smaller  than  ri[Tw  —  Tj  —  Tj).  Thus,  we  have 


bi{uk,Uk+i)  >  ri{Tw  —  Ti  —  Tj), 

Vz  e  C{uk+i)  \  Af{uk+i). 

By  combining  Ineqs.  (15)  and  (16),  and  Eq.  (13)  we  obtain 


E 


< 

< 


j€'C(ufc+i)\A/'(ti*+i)  ^Wz(l  /) 

RDPs{Uk+l) 

1-/  ’ 


(16) 

(17) 


(18) 


where  /  =  (7/ -I- 7j)/rH/. 

Next,  we  bound  the  second  right-hand  side  term  in  Ineq.  (14):  J2ieA^(uk+i)  For  this,  we  intro¬ 
duce  a  new  global  variable  Rnew  Rnew  is  initialized  at  the  beginning  of  each  interval  {uk,  Ufc+i]  to 
zero,  and  is  updated  to  Rnew  +  every  time  a  new  reservation  r  is  accepted.  Let  Rnew{t)  denote 
the  value  of  this  variable  at  time  t.  For  simplicity,  here  we  assume  that  a  flow  which  is  granted 
a  reservation  during  the  interval  {uk,Uk+i]  becomes  active  no  later  than  Uk+i-^  Then  it  is  easy 
to  see  that 


^  ^  Tj  ^  Rngnji^^k+l)- 
ieAf{uk+i) 

The  inequality  holds  when  no  duplicate  reservation  requests  are  processed,  and  none  of  the  new 
accepted  reservations  terminate  during  the  interval.  Then  we  define  Rcai{uk+i)  as 


Rc.t(uk+,)  = 


(20) 


From  Eq.  (14),  and  Ineqs.  (18)  and  (19)  follow  easily  that  Rcai{uk+i)  is  an  upper  bound  for 
R{uk+i),  i.e.,  Rcai{uk+i)  >  R{uk+i)-  Finally,  we  use  Rcai{uk+i)  to  re-calibrate  the  upper  bound 
of  the  aggregate  reservation,  Rboundi  at  Uk+i  as 

^Otherwise,  to  account  for  the  worst  case  in  which  a  reservation  that  was  accepted  by  the  node  during  (ufe_i,  u*] 
becomes  at  time  -f  RTT,  we  need  to  subtract  RTT  x  Rnew(uk)  from  B{uk,  Uk+i). 
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Per-hop  Admission  Control 

on  reservation  request  r 

if  {Rboxind  +  ^  C)  /*  perform,  admission  test  */ 


Rnew  —  Rfiew  H" 

Rbound  —  Rbound  “h 

accept  request; 
else 

deny  request; 

on  reservation  termination  r  /*  optional  */ 


Rbound  —  Rbound  ^5 


Aggregate  Reservation  Bound  Comp. 

on  packet  arrival  p 

b  4-  get-b{p);  /*  get  b  value  inserted  by  ingress  (Eq.  (12))  */ 
L  =  L  +  6; 
on  time-out  T\\r 

Rdps  =  LjTw]  /*  estimate  aggregate  reservation  */ 

Rbound  ^^^{Rbound^  RdPs/ (1  Rnew^] 

Rnew  ~  ^5 


Figure  7:  The  control  path  algorithms  executed  by  core  nodes;  Rnew  is  initialized  to  0. 


Rbound )?  (21) 

Figure  7  shows  the  pseudocode  of  control  algorithms  at  core  nodes.  Next  we  make  several 
observations. 

First,  the  estimation  algorithm  uses  only  the  information  in  the  current  interval.  This  makes 
the  algorithm  robust  with  respect  to  loss  and  duplication  of  signaling  packets  since  their  effects 
are  “forgotten”  after  one  time  interval.  As  an  example,  if  a  node  processes  both  the  original  and 
a  duplicate  of  the  same  reservation  request  during  the  interval  (u/,,  wa:+i],  Rbound  will  be  updated 
twice  for  the  same  flow.  However,  this  erroneous  update  will  not  be  reflected  in  the  computation 
of  -Rr)P5(wA:+2),  since  its  computation  is  based  only  on  the  b  values  received  during  'U/,+2]- 

As  a  consequence,  an  important  property  of  our  admission  control  algorithm  is  that  it  can 
asymptotically  reach  a  link  utilization  of  (7(1  —  /)/(!  +  /).  In  particular,  the  following  result  is 
proven  in  Appendix  D: 
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Theorem  3  Consider  a  link  of  capacity  C  at  time  t.  Assume  that  no  reservation  terminates 
and  there  are  no  reservation  failures  or  request  losses  after  time  t.  Then  if  there  is  sufficient 
demand  after  t  the  link  utilization  approaches  asymptotically  (7(1  —  /)/(!  +  /)• 

Second,  note  that  since  Rcai{uk)  is  an  upper  bound  of  R{uk),  a  simple  solution  would  be  to  use 
Rcatfuk)  +  Rnew,  instead  of  Rbound,  to  perform  the  admission  test  during  {uk,Uk+i].  The  problem 
with  this  approach  is  that  Real  can  overestimate  the  aggregate  reservation  R.  An  example  is 
given  in  Section  5.3  to  illustrate  this  issue  (Figure  13(b)). 

Third,  we  note  that  a  possible  optimization  of  the  admission  control  algorithm  is  to  add 
reservation  termination  messages  (see  Figure  7).  This  will  reduce  the  discrepancy  between  the 
upper  bound  Rbound  and  the  aggregate  reservation  R.  However,  in  order  to  guarantee  that  Rbound 
remains  an  upper  bound  for  R,  we  need  to  ensure  that  a  termination  message  is  sent  at  most 
once,  i.e.,  there  are  no  retransmissions  if  the  message  is  lost.  In  practice,  this  property  can  be 
enforced  by  edge  nodes,  which  maintain  per  flow  state. 

Finally,  to  ensure  that  the  maximum  inter-departure  time  is  no  larger  than  T/,  the  ingress 
node  may  need  to  send  a  dummy  packet  in  the  case  when  no  data  packet  arrives  for  a  flow  during 
an  interval  T/.  This  can  be  achieved  by  having  the  ingress  node  to  maintain  a  timer  with  each 
flow.  An  optimization  would  be  to  aggregate  all  “micro-flows”  between  each  pair  of  ingress  and 
egress  nodes  into  one  flow,  and  compute  b  values  based  on  the  aggregated  reservation  rate,  and 
insert  a  dummy  packet  only  if  there  is  no  data  packet  of  the  aggregate  flow  during  an  interval. 

5  Implementation  and  Experiments 

The  key  technique  of  our  algorithms  is  DPS,  which  encodes  states  in  the  packet  header,  and  thus 
eliminates  the  need  for  maintaining  per  flow  state  at  each  node.  Since  there  is  limited  space  in 
protocol  headers  and  most  header  bits  have  been  allocated,  the  main  challenge  of  implementing 
these  algorithms  is  to  (a)  find  space  in  the  packet  header  for  storing  DPS  variables  and  at  the  same 
time  remain  fully  compatible  with  current  standards  and  protocols;  and  (b)  efficiently  encode 
state  variables  so  that  they  fit  in  the  available  space  without  introducing  too  much  inaccuracy. 

In  the  remaining  of  the  section,  we  will  first  present  how  we  address  the  above  two  problems  in 
the  context  of  IPv4  networks,  describe  a  prototype  implementation  of  our  algorithms  in  FreeBSD 
v2.2.6,  and,  finally  we  give  results  from  experiments  in  local  testbed.  The  main  goal  of  these 
experiments  is  to  provide  a  proof  of  concept  of  our  design. 
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5.1  Carrying  State  in  Data  Packets 

Two  possibilities  to  encode  state  in  the  packet  header  are:  (1)  introduce  a  new  IP  option  and 
insert  the  option  at  the  ingress  router,  or  (2)  introduce  a  new  header  between  layer  2  and  layer  3, 
similar  to  the  way  labels  are  transported  in  Multi-Protocol  Label  Switching  (MPLS)  [4].  While 
both  of  these  solutions  are  quite  general  and  can  potentially  provide  large  space  for  encoding 
state  variables,  for  the  propose  of  our  implementation  we  consider  a  third  option:  store  the  state 
in  the  IP  header.  By  doing  this,  we  avoid  the  penalty  imposed  by  most  IPv4  routers  in  processing 
the  IP  options,  or  the  need  of  devising  different  solutions  for  different  technologies  as  it  would 
have  been  required  by  introducing  a  new  header  between  layer  2  and  layer  3. 

The  biggest  problem  with  using  the  IP  header  is  to  find  enough  space  to  insert  the  extra, 
information.  The  main  challenge  is  to  remain  compatible  with  current  standards  and  protocols. 
In  particular,  we  want  the  network  domain  to  be  transparent  to  end-to-end  protocols,  i.e.,  the 
egress  node  should  restore  the  fields  changed  by  ingress  and  core  nodes  to  their  original  values. 
To  achieve  this  goal,  we  first  use  four  bits  from  the  type  of  service  (TOS)  byte  (now  renamed  the 
Differentiated  Service  (DS)  field)  bits  which  are  specifically  allocated  for  local  and  experimental 
use  [21].  In  addition,  we  observe  that  there  is  an  ip.off  field  of  13  bits  in  the  IPv4  header  to 
support  packet  fragmentation/reassembly  which  is  rarely  used.  For  example,  bj'^  analyzing  the 
traces  of  over  1.7  million  packets  on  an  OC-3  link  [23],  we  found  that  less  than  0.22%  of  all 
packets  were  fragments.  Therefore,  in  most  cases  it  is  possible  to  use  ip-off  ^e\d  to  encode  the 
DPS  values.  This  idea  can  be  implemented  as  follows.  When  a  packet  arrives  at  an  ingress 
node,  the  node  checks  whether  a  packet  is  a  fragment  or  needs  to  be  fragmented.  If  neither  of 
these  are  true,  the  ip-off  field  in  the  packet  header  will  be  used  to  encode  DPS  values.  When 
the  packet  reaches  the  egress  node,  the  ip.off  \s  cleared.  Otherwise,  if  the  packet  is  a  fragment, 
it  is  forwarded  as  a  best-effort  packet.  In  this  way  the  use  of  ip-off  is  transparent  outside  the 
domain.  We  believe  that  forwarding  a  fragment  as  a  best-effort  packet,  is  acceptable  in  practice, 
as  end-points  can  easily  avoid  fragmentation  by  using  an  MTU  discovery  mechanism.  Also  note 
that  in  the  above  we  implicitly  assume  that  packets  can  be  fragmented  only  by  egress  nodes. 

In  summary,  we  have  up  to  17  bits  available  in  the  current  IPv4  header  to  encode  four  state 
variables.  The  next  section  discusses  how  we  use  this  space  to  encode  the  DPS  states. 


5.2  State  Encoding 

There  are  four  pieces  of  state  that  need  to  be  encoded:  three  are  for  scheduling  purposes,  (1)  the 
reserved  rate  r  or  equivalently  //r,  (2)  the  slack  delay  as  computed  by  Eq.  (10),  and  (3)  the 
amount  of  time  g  by  which  the  packet  was  transmitted  ahead  of  schedule  at  the  previous  node; 
and  one  for  admission  control  purpose,  (4)  b,  as  computed  by  Eq.  (12).  All  are  positive  values. 
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void  intToFP(int  val,  int  *mantissa,  int  ^exponent)  { 
int  nbits  =  get_num_bits(val) ; 
if  (nbits  <=  m)  { 

♦mantissa  =  val; 

♦exponent  =  (1  «  n)  -  1; 

}  else  { 

♦exponent  =  nbits  -  m  -  1; 

♦mantissa  =  (val  »  ♦exponent)  -  (1  «  m) ; 

> 

} 

int  FPToInt(int  mantissa,  int  exponent)  { 
int  tmp; 

if  (exponent  ==  ((1  «  n)  -  1)) 
return  mantissa; 
tmp  =  mantissa  |  (1  «  m); 
return  (tmp  «  exponent) 


Figure  8:  The  C  code  for  converting  between  integer  and  floating  point  formats,  m  represents  the 
number  of  bits  used  by  the  mantissa;  n  represents  the  number  of  bits  in  the  exponent.  Only  positive 
values  are  represented.  The  exponent  is  computed  such  that  the  first  bit  of  the  mantissa  is  always  1, 
when  the  number  is  >  .  By  omitting  this  bit,  we  gain  an  extra  bit  in  precision.  If  the  number  is 

<  2”^  we  set  by  convention  the  exponent  to  2”  -  1  to  indicate  this. 

One  possible  solution  is  to  restrict  each  state  variable  to  only  a  small  number  of  possible 
values.  For  example  if  a  state  variable  is  limited  to  eight  values,  only  three  bits  are  needed  to 
represent  it.  While  this  can  be  a  reasonable  solution  in  practice,  in  our  implementation  we  use  a 
more  sophisticated  representation.  Basically,  we  use  a  floating  point  like  format  to  represent  the 
largest  value,  and  then  represent  the  other  value(s)  as  a  fraction  of  the  largest  value.  In  this  way 
we  are  able  to  represents  a  much  larger  range  of  possible  values.  Since  computing  the  eligible 
time  and  the  deadline  involves  only  additions  over  these  values,  our  representation  achieves  good 
accuracy  in  terms  of  relative  error.  To  further  optimize  the  use  of  the  available  space  we  employ 
two  additional  techniques.  First,  we  use  the  floating  point  format  only  to  represent  the  largest 
value,  and  then  represent  the  other  value(s)  as  a  fraction  of  the  largest  value.  In  this  way  we  are 
able  to  represents  a  much  larger  range  of  possible  values.  Second,  in  the  case  in  which  there  are 
states  which  are  not  required  to  be  simultaneously  encoded  in  the  same  packet,  we  use  the  same 
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tos  byte  (DS  field)  ip_off  field 


024  567  9  10  16 


if(OF==0)Fl  <-(l/r)/F3  F2<-g/F3  F3  =  l/r  +  S 

if(OF==l)Fl<-b 


Figure  9:  For  carrying  state  we  use  the  four  bits  from  the  TOS  byte  (or  DS  field)  reserved  for  local  use 
and  experimental  purposes,  and  up  to  13  bits  from  the  ip.ojf.  The  first  three  bits  specify  whether  ip.off 
is  used  to  encode  DPS  variables.  FI,  F2,  and  F3  are  used  to  encode  the  DPS  variables  corresponding 
to  a  data,  packet  (codes  llx  identify  the  state  in  data  packet  headers). 

field  to  encode  them.  Next,  we  present  the  floating  point  like  format. 

Assume  that  a  is  the  largest  value  carried  by  the  packet,  where  a  is  a  positive  integer.  To 
represent  a  we  use  an  m  bit  mantissa  and  an  n  bit  exponent.  Since  a  >  0,  it  is  possible  to  gain 
an  extra,  bit  for  mantissa.  For  this  we  consider  two  cases:  (a)  if  a  >  2’"  we  represent  a  as  the 
closest  value  of  the  form  u2^,  where  2*"  <  u  <  2™+F  Then,  since  the  m  +  1-th  most  significant 
bit  in  the  u’s  representation  is  always  1,  we  can  ignore  it.  As  an  example,  assume  m  =  3,  n  =  4, 
and  a  =  19  =  10011.  Then  19  is  represented  as  18  =  u  x  2*^,  where  u  =  9  =  1001  and  u  =  1.  By 
ignoring  the  first  bit  in  the  representation  of  u  the  mantissa  will  store  001,  while  the  exponent 
will  be  1.  (b)  On  the  other  hand,  if  a  <  2”^^,  the  mantissa  will  contain  a,  while  the  exponent 
will  be  2”  —  1.  For  example,  for  m  =  3,  n  =  4,  and  a  =  6  =  110,  the  mantissa  is  110,  while 
the  exponent  is  1111.  Converting  from  one  format  to  another  can  be  efficiently  implemented. 
Figure  8  shows  the  conversion  code  in  C.  For  simplicity,  we  assume  that  integers  are  truncated 
rather  than  rounded  when  represented  in  floating  point. 

By  using  m  bits  for  mantissa  and  n  for  exponent,  we  can  represent  any  integer  in  the  range 
[0..(2'"+^  ~  1)  X  (2^”“^)]  with  a  relative  error  bounded  by  (—1/2'”+^,  1/2”*+^).  For  example,  with 
7  bits,  by  allocating  3  for  mantissa  and  4  for  exponent,  we  can  represent  any  integer  in  the  range 
[1..15  X  2^'^]  with  a  relative  error  of  (—6.25%,  6.25%).^ 

If  another  value  b  <  a  is  carried  by  the  packet  we  store  it  as  the  fraction  /  =  b/a.  Assuming 

^The  worst  relative  error  case  occurs  when  the  mantissa  is  8.  For  example  the  number  a  =  271  =  100001111 
is  encoded  as  u  =  1000,  v  =  b,  with  a  relative  error  of  (8x2®-  271)/271  =  -0.0554  =  -5.54%.  Similarly, 
o  =  273  =  100010001  is  encoded  as  u  —  1001,  v  =  b,  with  a  relative  error  of  5.55%. 
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router 


host 


Figure  10:  The  test  configuration  used  in  experiments. 

that  we  use  mi  bits  to  represent  /,  the  absolute  error  is  bounded  by  (— 1/(2(2™'  —  1)),  l/(2(2™'  — 
1))).  The  —1  in  the  denominators  is  a  result  of  mapping  2™^  values  to  [0,  1],  with  2™^  —  1 
representing  1.  Finally,  it  is  easy  to  show  that  by  representing  a  in  floating  point  format  with  m 
bits  for  mantissa  and  n  bits  for  exponent,  and  by  using  mi  bits  to  encode  b,  the  relative  error  of 
a  +  6,  denoted  RelErr{a  +  6),  is  bounded  by 

where  we  ignore  the  second  order  term  1/(2'”+^  (2™*"''^  “2)). 

Figure  9  shows  how  the  17  bits  available  in  the  current  IPv4  header  are  used  to  encode  DPS 
states  in  a  data  packet.  The  17  bits  are  divided  in  four  fields:  a  code  field  which  specifies  whether 
the  ip.off  is  used  to  encode  state  variables,  and  three  data  fields,  denoted  F1,F2  and  F3,  used 
to  encode  our  variables. 

The  code  field  consists  of  three  bits:  000  means  that  the  packet  is  a  fragment  and  therefore 
no  state  is  encoded;  any  other  value  means  that  up  to  13  bits  of  ip-off  are  used  to  encode  the 
state.  In  particular,  the  code  values  specify  the  layout  and  the  states  encoded  in  the  packet 
header.  For  example,  llx  specifies  that  the  encoded  states  correspond  to  a  data  packet,  while 
100  specifies  that  the  encoded  states  correspond  to  a  dummy  packet.  Due  to  space  limitations, 
in  Figure  9  we  show  the  state  encoding  for  a  data  packet  only.  In  this  case,  the  last  bit  of  the 
code  field,  also  called  Offset  Field  (OF),  determines  the  content  of  FI.  If  this  bit  is  1,  then  FI 
encodes  the  b  value.  Otherwise  it  encodes  (l/r)/ F3,  where  F3  =  Ijr  8.  Finally,  F2  encodes 
gfF3.  We  make  several  observations.  First,  since  F3  encodes  the  largest  value  among  all  fields, 
we  represent  it  in  floating  point  format.  By  using  this  format,  with  seven  bits  we  can  represent 
any  positive  number  in  the  range  [1..'15  x  with  a  relative  error  within  (—6.25%,  6.25%). 
Second,  since  the  deadline  determines  the  delay  guarantees,  we  use  a  representation  that  trades 
the  eligible  time  accuracy®  for  the  deadline  accuracy.  In  particular,  the  deadline  is  computed  as 
d  =  currentJtime  +  F2  ♦  F3  +  F3  current  Jime  g  l/r  5.  If  OF  is  0,  the  eligible  time  is 
computed  as  e  =  d  —  FI  *  F3  ~  current  Jime  +  g  +  6.  FI  uses  only  three  bits  and  its  value  is 

®As  long  as  the  eligible  time  value  is  under-estimated,  its  inaccuracy  will  affect  only  the  scheduling  complexity, 
as -the  packet  may  become  eligible  earlier. 
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Figure  11:  Packet  arrival  and  departure  times  for  a  10  Mbps  flow  at  (a)  the  ingress  node,  and  (b)  the 
egress  node. 


computed  such  that  Fl  *  F3  always  over-estimates  l/r.  If  OF  is  1,  the  eligible  time  is  computed 
simply  as  e  =  current Oime.  Third,  we  express  h  in  units  equals  with  the  maximum  packet  size. 
In  this  way  we  eliminate  the  need  for  each  packet  to  carry  the  b  value.  In  fact,  if  a  flow  sends 
at  its  reserved  rate,  only  one  packet  every  other  eight  packets  needs  to  carry  the  b  value.  This 
observation,  combined  with  the  fact  that  the  under-estimation  of  the  packet  eligible  time  does 
not  affect  the  guaranteed  delay  of  the  flow,  allows  us  to  alternatively  encode  either  6  or  (//r)/ F3 
in  FI,  without  impacting  the  correctness  of  our  algorithms. 


5.3  Experimental  Results 

We  have  implemented  these  algorithms  in  FreeBSD  v2.2.6  and  deployed  them  in  a  testbed  consist¬ 
ing  of  266  MHz  and  300  MHz  Pentium  II  PCs  connected  by  point-to-point  100  Mbps  Ethernets. 
The  testbed  allows  configuring  a  path  with  up  to  two  intermediate  routers. 

In  the  following,  we  present  results  from  four  simple  experiments.  The  experiments  are 
designed  to  illustrate  the  microscopic  behaviors  of  the  algorithms,  rather  than  their  scalability. 
All  experiments  were  run  on  the  topology  shown  in  Figure  10.  The  first  router  is  configured  as 
an  ingress  node,  while  the  second  router  is  configured  as  an  egress  node.  An  egress  node  also 
implements  the  functionalities  of  a  core  node.  In  addition,  it  restores  the  initial  values  of  the 
field.  All  traffic  is  UDP  and  all  packets  are  1000  bytes,  not  including  the  header. 

In  the  first  experiment  we  consider  a  flow  between  hosts  1  and  3  that  has  a  reservation  of 
10  Mbps  but  sends  at  a  much  higher  rate  of  about  30Mbps.  Figures  11(a)  and  (b)  plot  the 
arrival  and  departure  times  for  the  first  30  packets  of  the  flow  at  the  ingress  and  egress  node, 
respectively.  One  thing  to  notice  in  Figure  11(a)  is  that  the  arrival  rate  at  the  ingress  node 
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Figure  12:  The  packets’  arrival  and  departure  times  for  four  flows.  The  first  three  flows  are  guaranteed, 
with  reservations  of  10  Mbps,  20  Mbps,  and  40  Mbps.  The  last  flow  is  best  effort  with  an  arrival  rate 
of  about  60  Mbps. 

is  almost  three  times  the  departure  rate,  which  is  the  same  as  the  reserved  rate  of  10  Mbps. 
This  illustrate  the  non-work-conserving  nature  of  the  C  J  V C  algorithm,  which  enforces  the  traffic 
profile  and  allows  only  10  Mbps  trafiic  into  the  network.  Another  thing  to  notice  is  that  all 
packets  incur  about  0.8  ms  delay  in  the  egress  node.  This  is  because  they  are  sent  by  the  ingress 
node  as  soon  as  they  become  eligible,  and  therefore  g  ~  //r  =  8  x  1052bits/10Mbps  =  0.84  ms. 
As  a  result,  they  will  be  held  in  the  rate-controller  for  this  amount  of  time  at  the  next  hop®, 
which  is  the  egress  node  in  our  case. 

c 

In  the  second  experiment  we  consider  three  guaranteed  flows  between  hosts  1  and  3  with 
reservations  of  10  Mbps,  20  Mbps,  and  40  Mbps,  respectively.  In  addition,  we  consider  a  fourth 
UDP  flow  between  hosts  2  and  4  which  is  treated  as  best  effort.  The  arrival  rates  of  the  first 
three  flows  are  slightly  larger  than  their  reservations,  while  the  arrival  rate  of  the  fourth  flow 
is  approximately  60  Mbps.  At  time  0,  only  the  best-effort  flow  is  active.  At  time  2.8  ms,  the 
first  three  flows  become  simultaneously  active.  Flows  1  and  2  terminate  after  sending  12  and  35 
packets,  respectively.  Figure  12  shows  the  packet  arrival  and  departure  times  for  the  best-effort 
flow  4,  and  the  packet  departure  times  for  the  real-time  flows  1,  2,  and  3.  As  can  be  seen, 
the  best-effort  packets  experience  very  low  delay  in  the  initial  period  of  2.8  ms.  After  the  QoS 
flows  become  active,  best-effort  packets  experience  longer  delays  while  QoS  flows  receive  service 
at  their  reserved  rate.  After  flow  1  and  2  terminate,  the  best-effort  traffic  grabs  the  remaining 

®Note  that  since  all  packets  have  the  same  size,  6  =  0. 
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Figure  13:  The  estimate  aggregate  reservation  RcaU  and  the  bounds  Rhound  and  Read  in  the  case  of  (a) 
two  ON-OFF  flows  with  reservations  of  0.5  Mbps,  and  1.5  Mbps,  respectively,  and  in  the  case  when 
(b)  one  reservation  of  0.5  Mbps  is  accepted  at  time  t  =  18  seconds,  and  then  is  terminated  at  t  =  39 
seconds. 

bandwidth. 

The  last  two  experiments  illustrate  the  algorithms  for  admission  control  described  in  Sec¬ 
tion  4.3.  The  first  experiment  demonstrates  the  accuracy  of  estimating  the  aggregate  reservation 
based  on  the  h  values  carried  in  the  packet  headers.  The  second  experiment  illustrates  the  com¬ 
putation  of  the  aggregate  reservation  bound,  Rhound-,  when  a  new  reservation  is  accepted  or  a 
reservation  terminates.  In  these  experiments  we  use  an  averaging  interval,  Tw,  of  5  seconds,  and 
a  maximum  inter-departure  time,  7/,  of  500  ms.  For  simplicity,  we  neglect  the  delay  jitter,  i.e., 
we  assume  Tj  =  0.  This  gives  us  /  =  (T/  -f  Tj)/Tw  =  0.1. 

In  the  first  experiment  we  consider  two  flows,  one  with  a  reservation  of  0.5  Mbps,  and  the 
other  with  a  reservation  of  1.5  Mbps.  Figure  13(a)  plots  the  arrival  rate  of  each  flow,  as  well 
as  the  arrival  rate  of  the  aggregate  traffic.  In  addition.  Figure  13(a)  plots  the  bound  of  the 
aggregate  reservation  used  by  admission  test,  Rhound,  the  estimate  of  the  aggregate  reservation 
Rdps,  and  the  bound  Real  used  to  recalibrate  Rhound-  According  to  the  pseudocode  in  Figure  7, 
both  Rdps  and  Real  are  updated  at  the  end  of  each  estimation  interval.  More  precisely,  every 
5  seconds  Rdps  is  computed  based  on  the  b  values  carried  in  the  packet  headers,  while  Real  is 
computed  as  RDPs/{l—f)  +  Rnew-  Note  that  since  in  this  case  no  new  reservation  is  accepted,  we 
have  Rnew  =  0,  which  yields  Real  =  RdpsH^  —  /).  The  important  thing  to  note  in  Figure  13(a) 
is  that  the  rate  variation  of  the  actual  traffic  (represented  by  the  continuous  line)  has  little  effect 
on  the  accuracy  of  computing  the  aggregate  reservation  estimate  Rdps,  and  consequently  of 
Real-  In  contrast,  traditional  measurement  based  admission  control  algorithms,  which  base  their 
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5.02 

1.63 

4.38 

1.55 

5.36 

1.75 

4.60 
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5.40 
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dequeue 

1.52 
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3.14 

3.27 

2.69 

2.81 

2.79 

3.68 

2.30 

2.91 

2.77 

2.82 

1.73 

2.12 

Table  4:  The  average  and  standard  deviation  of  the  enqueue  and  dequeue  times,  measured  in  fj,s. 

estimation  on  the  actual  traffic,  would  significantly  under-estimate  the  aggregate  reservation, 
especially  during  the  time  periods  when  no  data  packets  are  received.  In  addition,  note  that 
since  in  this  experiment  Real  is  always  larger  than  Rbound,  and  no  new  reservations  are  accepted, 
the  value  of  Rbound  is  never  updated. 

In  the  second  experiment  we  consider  a  scenario  in  which  a  new  reservation  of  0.5  Mbps  is 
accepted  at  time  t  =  18  seconds  and  terminates  approximately  at  time  f  =  39  seconds.  For  the 
entire  time  duration,  plotted  in  Figure  13(b),  we  have  a  background  traffic  with  an  aggregate 
reservation  of  0.5  Mbps.  Similarly  to  the  previous  case,  we  plot  the  rate  of  the  aggregate  traffic, 
and,  in  addition,  Rbound,  Real,  and  Rdps-  There  are  several  points  worth  noting.  First,  when 
the  reservation  is  accepted  at  time  t  =  18  seconds,  Rbound  increases  by  the  value  of  the  accepted 
reservation,  i.e.,  0.5  Mbps  (see  Figure  7).  In  this  way,  Rbound  is  guaranteed  to  remain  an  upper 
bound  of  the  aggregate  reservation  R.  In  contrast,  since  both  Rdps  and  Real  are  updated  only  at 
the  end  of  the  estimation  interval,  they  under-estimate  the  aggregate  reservation,  as  well  as  the 
aggregate  traffic,  before  time  f  =  20  seconds.  Second,  after  Real  is  updated  at  time  t  =  20  seconds, 
as  Rdps  I  “  /)  +  Rnew,  the  new  value  significantly  over-estimates  the  aggregate  reservation. 
This  is  the  main  reason  for  which  we  do  not  use  Real  {+Rnew),  but  Rbound,  to  do  the  admission 
control  test.  Third,  note  that  unlike  the  case  when  the  reservation  was  accepted,  Rbound  does  not 
change  when  the  reservation  terminates  at  time  f  =  39  seconds.  This  is  simply  because  in  our 
implementation  no  tear-down  message  is  generated  when  a  reservation  terminates.  However,  as 
Real  is  updated  at  the  end  of  the  next  estimation  interval  (i.e.,  at  time  f  =  45  seconds),  Rbound 
drops  to  the  correct  value  of  0.5  Mbps.  This  shows  the  importance  of  using  Real  to  recalibrate 
Rbound-  In  addition,  this  illustrates  the  robustness  of  our  algorithm,  i.e.,  the  over-estimation 
in  a  previous  period  is  corrected  in  the  next  period.  Finally,  note  that  in  both  experiments 
Rdps  always  under-estimates  the  aggregate  reservation.  This  is  due  to  the  truncation  errors  in 
computing  both  the  b  values  and  the  Rdps  estimate. 
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5.4  Processing  Overhead 

To  evaluate  the  overhead  of  our  algorithm  we  have  performed  three  experiments  on  a  300  MHz 
Pentium  II  involving  1,  10,  and  100  flows,  respectively.  The  reservation  and  actual  sending  rates 
of  all  flows  are  identical.  The  aggregate  sending  rate  is  about  20%  larger  than  the  aggregate 
reservation  rate.  Table  4  shows  the  means  and  the  standard  deviations  for  the  enqueue  and 
dequeue  times  at  both  ingress  and  egress  nodes.  Each  of  these  numbers  is  based  on  a  measurement 
of  1000  packets.  For  comparison  we  also  show  the  enqueue  and  dequeue  times  for  the  unmodified 
code.  There  are  several  points  worth  noting.  First,  our  implementation  adds  less  than  5  fis 
overhead  per  enqueue  operation,  and  about  2  fj.s  per  dequeue  operation.  In  addition,  both 
the  enqueue  and  dequeue  times  at  the  ingress  node  are  greater  than  at  the  egress  node.  This 
is  because  ingress  node  performs  per  flow  operations.  Furthermore,  as  the  number  of  flows 
increases  the  enqueue  times  increase  only  slightly,  i.e.,  by  less  than  20%.  This  suggests  that 
our  algorithm  is  indeed  scalable  in  the  number  of  flows.  Finally,  the  dequeue  times  actually 
decrease  as  the  number  of  flows  increases.  This  is  because  the  rate-controller  is  implemented  as 
a  calendar  queue  with  each  entry  corresponding  to  a  128  fis  time  interval.  Packets  with  eligible 
times  falling  between  the  same  interval  are  stored  in  the  same  entry.  Therefore,  when  the  number 
of  flows  is  large,  more  packets  are  stored  in  the  same  calendar  queue  entry.  Since  all  these  packets 
are  transferred  during  one  operation  when  they  become  eligible,  the  actual  overhead  per  packet 
decreases. 


6  Related  Work 

Our  scheme  shares  its  intellectual  roots  with  two  pieces  of  related  work:  Diflfserv  and  the  Core- 
Stateless  Fair  Queueing. 

The  idea  of  implementing  QoS  services  by  using  a  core-stateless  architecture  was  first  proposed 
by  Jacobson  [22]  and  Clark  [7],  and  is  now  being  pursued  by  the  IETF  Diffserv  working  group  [12]. 
There  are  several  differences  between  our  scheme  and  the  existing  Diffserv  proposals.  First,  our 
algorithms  operate  at  a  much  finer  granularity  both  in  terms  of  time  and  traffic  aggregates:  the 
state  embedded  in  a  packet  can  be  highly  dynamic,  as  it  encodes  the  current  state  of  the  flow, 
rather  than  the  static  and  global  properties  such  as  dropping  or  scheduling  priority.  In  addition, 
the  goal  of  our  scheme  is  to  implement  distributed  algorithms  that  try  to  approximate  the  services 
provided  by  a  network  in  which  all  routers  implement  per  flow  management.  Therefore,  we  can 
provide  service  differentiation  and  performance  guarantees  on  a  per  flow  basis.  In  contrast, 
existing  Diffserv  solutions  provide  service  differentiation  only  among  a  small  number  of  trafiSc 
classes.  Finally,  we  propose  fully  distributed  and  dynamic  algorithms  for  implementing  both 
data  and  control  functionalities,  where  existing  Diffserv  solutions  rely  on  more  centralized  and 
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static  algorithms  for  implementing  admission  control. 

We  first  proposed  the  idea  of  using  Dynamic  Packet  State  to  encode  dynamic  per  flow  state  in 
the  context  of  approximating  the  Fair  Queueing  algorithm  in  a  SCORE  architecture  [29].  While 
algorithms  proposed  in  this  paper  share  the  same  architecture  as  CSFQ,  there  are  important 
differences  both  in  high  level  goals  and  low  level  mechanisms.  First,  while  CSFQ  was  designed  to 
support  best-effort  traffic,  algorithms  proposed  here  are  designed  to  support  guaranteed  services. 
As  a  consequence,  while  CSFQ  can  use  a  probabilistic  forwarding  algorithm  to  statistically 
approximate  the  Fair  Queueing  service,  C  JVC  needs  to  use  more  elaborate  mechanisms  to  provide 
performance  guarantees  identical  to  those  provided  by  Virtual  Clock  or  Weighted  Fair  Queueing 
algorithms.  In  particular,  CJVC  uses  three  types  of  Dynamic  Packet  State  for  scheduling  purpose 
and  regulates  traffic  at  each  hop.  One  more  type  of  Dynamic  Packet  State  was  used  to  implement 
the  admission  control,  which  was  not  needed  in  CSFQ.  Finally,  we  have  proposed  a  detailed  design 
for  encoding  the  DPS  variables  in  IPv4. 

In  this  paper,  we  propose  a  technique  to  estimate  the  aggregate  reservation  rate  and  use 
that  estimate  to  perform  admission  control.  While  this  may  look  similar  to  measurement-based 
admission  control  algorithms  [19,  32],  the  objectives  and  thus  the  techniques  are  quite  different. 
The  measurement-based  admission  control  algorithms  are  designed  to  support  controlled-load 
type  of  services,  the  estimation  is  based  on  the  actual  amount  of  traffic  transmitted  in  the  past, 
and  is  usually  an  optimistic  estimate  in  the  sense  that  the  estimated  aggregate  rate  is  smaller 
than  the  aggregate  reserved  rate.  While  this  has  the  benefit  of  increasing  the  network  utilization 
by  the  controlled-load  service  traffic,  it  has  the  risk  of  incurring  transient  overloads  that  may 
cause  the  degradation  of  QoS.  In  contrast,  our  algorithm  aims  to  support  guaranteed  service, 
and  the  goal  is  to  estimate  a  close  upper  bound  on  the  aggregate  reserved  rate  even  when  the 
the  actual  arrival  rate  may  vary. 

In  [9],  Cruz  proposed  a  novel  scheduling  algorithm  called  SCED-|-  in  the  context  of  ATM 
networks.  In  SCED-I-,  virtual  circuits  sharing  a  same  path  segment  are  aggregated  into  a  virtual 
path.  At  each  switch,  only  per  virtual  path  state  instead  of  per  virtual  circuit  needs  to  be 
maintained  for  scheduling  purpose.  In  addition,  an  algorithm  is  proposed  to  compute,  the  eligible 
times  and  the  deadlines  of  a  packet  at  subsequent  nodes,  when  the  packet  enters  a  virtual  path. 
We  note  that  by  doing  this  and  using  DPS  to  carry  this  information  in  the  packets’  headers,  it 
is  possible  to  remove  per  path  scheduling  state  from  core  nodes.  However,  unlike  our  solution, 
SCED-I-  do  not  provide  per  flow  delay  differentiation  within  an  aggregate.  In  addition,  the 
SCED-I-  work  focuses  on  the  data  path  mechanism,  while  we  addresses  both  data  path  and 
control  path  issues. 
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7  Conclusion 


In  this  paper,  we  developed  two  distributed  algorithms  that  implement  QoS  scheduling  and 
admission  control  in  a  SCORE  network  where  core  routers  do  not  maintain  per  flow  state.  Com¬ 
bined,  these  two  algorithms  significantly  enhance  the  scalability  of  both  the  data  and  control 
plane  mechanisms  for  implementing  guaranteed  services,  and  at  the  same  time,  provide  guaran¬ 
teed  services  with  flexibility,  utilization,  and  assurance  levels  similar  to  those  that  can  be  provided 
with  per  flow  mechanisms.  The  key  technique  used  in  both  algorithms  is  called  Dynamic  Packet 
State  (DPS),  which  provides  a  lightweight  and  robust  means  for  routers  to  coordinate  actions 
and  implement  distributed  algorithms.  By  presenting  a  design  and  prototype  implementation  of 
the  proposed  algorithms  in  IPv4  networks,  we  demonstrate  that  it  is  indeed  possible  to  apply 
DPS  techniques  and  have  minimum  incompatibility  with  existing  protocols. 

As  a  final  note,  we  believe  DPS  is  a  powerful  concept.  By  using  DPS  to  coordinate  actions  of 
edge  and  core  routers  along  the  path  traversed  by  a  flow,  distributed  algorithms  can  be  designed 
to  approximate  the  behavior  of  a  broad  class  of  “stateful”  networks  with  networks  in  which  core 
routers  do  not  maintain  per  flow  state.  We  observe  that  it  is  possible  to  extend  the  current 
Diffserv  framework  to  accommodate  algorithms  using  Dynamic  Packet  State  such  as  the  ones 
proposed  in  this  paper  and  Core-Stateless  Fair  Queueing  [29].  The  key  extension  needed  is  to 
associate  with  each  Per  Hop  Behavior  (PHB)  additional  space  in  the  packet  header  for  storing 
PHB  specific  Dynamic  Packet  State  [31].  Such  a  paradigm  will  significantly  increase  the  flexibility 
and  capabilities  of  the  services  that  can  be  built  with  a  Diffserv-like  architecture. 
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Appendix  A:  Network  Utilization  of  Premium  Service  in 
Diffserv  Networks 

Premium  service  provides  the  equivalent  of  a  dedicated  link  of  fixed  bandwidth  between  edge 
nodes  in  a  Diffserv  network.  In  such  a  service,  each  premium  flow  has  a  reserved  peak  rate. 
In  the  data  plane,  ingress  nodes  police  each  premium  service  traffic  flow  according  to  its  peak 
reservation  rate.  Inside  the  Diffserv  domain,  core  routers  put  the  aggregate  of  all  premium  traffic 
into  one  scheduling  queue  and  service  the  premium  traffic  with  strict  priority  over  best  effort 
traffic.  In  the  control  plane,  a  bandwidth  broker  is  used  to  perform  admission  control.  The  idea 
is  that  by  using  very  conservative  admission  control  algorithms  based  on  worst  case  analysis, 
together  with  peak  rate  policing  at  ingress  nodes  and  static  priority  scheduling  at  core  nodes,  it 
is  possible  to  ensure  that  all  premium  service  packets  incur  very  small  queueing  delay. 

One  important  design  question  is:  how  conservative  does  the  admission  control  algorithm 
need  to  be?  In  other  words,  what  is  the  upper  limit  on  the  utilization  of  the  network  capacity 
that  can  be  allocated  to  premium  traffic  if  we  want  the  premium  service  to  achieve  the  same 
level  of  service  assurance  as  the  guaranteed  service,  such  that  the  queueing  delay  of  all  premiurti 
service  packets  is  bounded  by  a  fixed  number  even  in  the  worst  case? 

For  the  purpose  of  this  discussion,  we  use  flow  to  refer  to  a  subset  of  packets  that  traverse 
the  same  path  inside  a  Diffserv  domain  between  two  edge  nodes.  Thus,  with  the  highest  level 
of  traffic  aggregation,  a  flow  consists  of  all  packets  between  the  same  pair  of  ingress  and  egress 
nodes.  Note  that  even  in  this  case,  the  number  of  flows  in  a  network  can  be  quite  large  as  it  may 
increase  quadratically  with  the  number  of  edge  nodes. 

Let  us  consider  a  domain  consisting  of  4  x  4  routers  with  links  of  capacity  C.  Assume  that  the 
fraction  of  the  link  capacity  allocated  to  the  premium  traffic  is  limited  to  7.  Assume  also  that  all 
flows  have  equal  packet  sizes,  and  that  each  ingress  node  shapes  not  only  each  flow,  but  also  the 
aggregate  traffic  at  each  of  its  outputs.  Figure  14(a)  shows  the  traffic  pattern  at  the  first  core 
router  along  a  path.  Each  input  receives  12  identical  flows,  where  each  flow  has  a  reservation  of 
7(7/12  =  (7/48.  Let  T  be  the  transmission  time  of  one  packet,  then  as  shown  in  the  Figure,  the 
inter-arrival  time  between  two  consecutive  packets  in  the  each  flow  is  48r,  and  the  inter-arrival 
time  between  two  consecutive  packets  in  the  aggregate  flow  is  4t. 

Assume  the  first  three  flows  at  each  input  are  forwarded  to  output  1.  This  will  cause  a  burst 
of  12  packets  to  arrive  output  1  in  a  8r  long  interval  and  the  last  packet  of  the  burst  to  incur  an 
additional  delay  of  3t.  Now  assume  that  the  next  router  receives  at  each  input  a  traffic  pattern 
similar  to  the  one  generated  by  output  1  of  the  first  core  router,  as  shown  in  Figure  14(b).  In 
addition,  assume  that  the  last  three  flows  from  each  input  burst  are  forwarded  to  output  1.  This 
will  cause  a  burst  of  12  packets  to  arrive  output  1  in  a  2t  long  interval  and  the  last  packet  in  the 
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Figure  14:  Per-hop  worst-case  delay  experienced  by  premium  traffic  in  a  DiflFserv  domain,  (a)  and  (b) 
shows  the  traffic  pattern  at  the  first  and  a  subsequent  node.  The  black  and  all  dark  grey  packets  go  to 
the  first  output;  the  light  grey  packets  go  to  the  other  outputs. 

burst  to  incur  an  additional  delay  of  9r.  Thus,  after  two  hops,  a  packet  is  delayed  by  as  much 
as  12t.  This  pattern  can  be  repeated  for  all  subsequent  hops. 

In  general,  consider  a.  k  x  k  router,  and  let  n  be  the  number  of  flows  that  traverse  each  link. 
For  simplicity,  assume  that  7  >  1/A;.  Then  it  can  be  shown  that  the  worst  case  delay  experienced 
by  a  packet  after  h  hops  is 

D  =  —  1  —  -  1^  ^  j  r  +  (/j  -  l)n^-^T  -f  hr,  (23) 

where  the  first  term  is  the  additional  delay  at  the  first  hop,  the  second  term  is  the  additional 
delay  at  all  subsequent  hops,  and  the  last  term  accounts  for  the  packet  transmission  time  at  each 
hop.  As  a  a  numerical  example,  let  (7  =  1  Gbps,  a  packet  size  of  1500  bytes.  A;  =  16,  7  =  10%, 
n  —  1500  and  h  =  15.  From  here  we  obtain  r  =  12  /zsec,  and  a  delay  D  of  over  240  ms.  Finally, 
if  7  <  1/A;,  it  can  be  shown  that  it  will  take  only  |'logj.(l/7)]  hops  to  achieve  a  continuous  burst. 
For  example,  for  7  =  1%  and  A;  =  16,  it  takes  only  two  hops  to  obtain  a  continuous  burst. 

The  above  example  demonstrates  that  low  network  utilization  and  traffic  shaping  at  ingress 
nodes  alone  are  not  enough  to  guarantee  a  “small”  worst-case  delay  for  'all  the  premium  traffic. 
This  result  is  not  surprising.  Even  using  a  per  flow  scheduler  like  Weighted  Fair  Queueing 
(WFQ),  will  not  help  to  reduce  the  worst  case  end-to-end  delay  for  all  packets.  In  fact,  if  all 
flows  in  the  above  example  are  given  the  same  weight,  the  worst  case  delay  under  WFQ  is  hnr, 
which  is  basically  the  same  as  the  one  given  by  Eq.  (23).  However,  the  major  advantage  of  using 
WFQ  is  that  it  allows  us  to  differentiate  among  flows,  which  is  a  critical  property  as  long  as  we 
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cannot  guarantee  a  “small”  delay  to  all  flows.  In  addition,  WFQ  can  achieve  100%  utilization. 


Appendix  B:  Proof  of  Theorem  1 


In  this  appendix  we  show  that  a  network  of  CJVC  servers  provides  the  same  end-to-end  delay 
guarantees  as  a  network  of  Jitter-VC  servers.  In  particular,  in  Theorem  1  we  show  that  the 
deadline  of  a  packet  at  the  last  hop  in  both  systems  is  the  same.  This  result  is  based  on  Lemmas  2 
and  3  which  give  the  expressions  of  the  deadline  of  a  packet  at  the  last  hop  in  a  network  of  Jitter- 
VC,  and  a  network  of  CJVC  servers,  respectively.  First,  we  present  a  preliminary  result  used  in 
proving  Lemma  2. 

Lemma  1  Consider  a  network  of  Jitter-VC  servers.  Let  wj  denote  the  propagation  delay  between 
hops  j  and  j  -f  1,  and  let  tj  be  the  maximum  transmission  time  of  a  packet  at  node  j.  Then  for 
any  j  >  1  and  i,k  >  1  we  have 


^l,j  "f?  —  ^i,j  ^j-i-  (24) 

Proof.  The  proof  is  by  induction  on  k.  First,  recall  that  by  definition  qV  =  S-  -1-  r,  —  5*  (see 
Table  1),  and  that  for  j  >  1,  afj  =  -|-  7rj_i.  From  here  and  from  Eqs.  (1)  and  (2)  we  have 

then 


dfj  =  max(afj  +  g^j-i^dC  ^)  +  ^  =  max(dfj_i  -|-  Tj-i  + 

^  I  Ti 


(25) 


Basic  Step.  For  fc  =  1  and  any  j  >  1,  from  Eq.  (25)  we  have  trivially  d\  -  =  d\  -_.^  J-  rj_i  -f  7rj_i  -f 
llln,  V;  >  1,  and  therefore  dV  -  =  /l/r^,  Vj  >  1. 

Induction  Step.  Assume  Ineq.  (24)  is  true  for  k.  Then  we  need  to  show  that 


^  -  4i-i  -  T/-1  -  T^i-i  ^ 

max(dHi  +  Tj  +  TTj ,  dC^^ )  -  max(dH2 ^  t,_i  -f  ,  dj^.)  -  Tj  -TXj>  (26) 

max(d^'j_j  -f  -|-  7rj_i,  d/j)  —  max(d/j'_2  -f-  Tj_2  J-  7rj_2,  d^  j_fj  —  Tj-i  —  7rj_i, 


where  the  second  Inequality  follows  after  using  Eq.  (25).  Next  consider  two  cases:  whether 
dfj-i  +  Tj-i  -I-  7rj_i  <  dC  or  not.  Assume  dfV(i  -|-  rj_i  -|-  7rj_i  <  dfj.  From  Ineq.  (26)  and  from 
the  induction  hypothesis  we  obtain 


Tj)  TTj 


max(dft^  +  Tj  +  7r,',df_j+i)  - 
max(dH!i  +  r,_i  J-  iTj^i^d^j)  -  Tj  -  iTj 


(27) 
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Next,  assume  that 


=  max((/fj’^  +  Tj  -f  TTj,  -Tj 

^  ^tj+i  “  ~  (induction  hypothesis) 

>  dlj  -  4_i  -  Tj-i  -  7r,_i 

—  ~  +  '0-1  +  “  '0-1  “  ■^i-1 

=  max(4+ii  +  Tj-i  +  TTj-i, 4j)  - 

max(dft22  +  rj_2  +  7rj_2,4j_i)  -  rj_i  -  7rj_i 

dfjii  +  0-1  + 


From  here  and  by  using  Eq.  (25)  and  Ineq.  (26)  we  have 


<Si  -  <i '  -  O  -  = 


> 


max(dH^  +  O  +  4j+i)  “ 

max(dH^ i  +  O -i  +  0-i  >  )  “  O  “ 

max(4+^  +  o  +  TTjjdfj+i)  -  dJ+ij  -  o-i  -  0-i  “  O  " 

4i '  -  <+ii  -  0-1  -  T^i-i 

dfj  ‘  -  inax(dH^i  +  o_i  +  df^)  (Eq.  (25)) 

]k+l 

^  (Eq.  (25)) 

n 

dfjii  -  max(df+i2  +  0-2  +  7rj_2,  dfj_i) 

max(dHii  +  0-1  +  ^Tj-i, df^)  -  0-i  -  ^Tj-i  -  (Ineq.  (28)) 

max(dfti2  +  0-2  +  0-2 ,  dfj_i ) 

41' -  41-1  -  0-1  -  0-1  • 


This  completes  the  proof.  □ 


Lemma  2  The  deadline  of  any  packet  p\ ,  k  >  I,  at  the  last  hop  h  in  a  network  of  Jitter- VC 
servers  is 


^i,h  ~  max  j  +  /i  +  (t„,  +  Trm),d-  i^  +  'j  . 

\  m=l 


(30) 


Proof.  Let  j*  >  1  be  the  last  hop  for  which  dfj._i  +  o—i  +  0*-i  <  4i*'-  consider  two 
cases  whether  j*  exists  or  not. 
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Case  1.  {j*  does  not  exist)  From  Eq.  (1)  we  have  eC  =  +  r,_i  +  7rj_i,  Vj  >  1.  From  here 

and  by  using  Eq.  (2)  we  obtain 


/i—l 

"*■  +  X!  +  TTm). 

m=l 


(31) 


Because  we  assume  that  j*  does  not  exist  we  also  have  d!-  f^  =  +  if /ri,  which 

concludes  the  proof  of  this  case. 


Case  2.  {j*  exists)  In  this  case  we  show  that  j*  =  h.  Assume  this  is  not  true.  Then  we  have 
~  >  j*-  Eq.  (2)  we  obtain 

dfh  =  efj,  +  {h  -j*  +  l)l^+  J2  (Tm  +  nm).  (32) 

m=j* 

On  the  other  hand,  by  the  definition  of  j*  and  from  Eqs.  (1)  and  (2)  we  have 

ik  *■ 

df,i*  —  +  Tj._i  +  7rj._i,  d^j.^)  +  —  (33) 

Ik 

>  +  Tj*-!  +  7rj._i  + 

As  a  result  we  obtain  df  -t  —  df  j,_i  —  Tj*_i  —  7rj._i  >  li/vi.  By  iteratively  applying  Lemma  1  we 
have 


-df. 


TTn 


>  d? 


— 


d^  - 


-L?'* 


Vm  >  j* 


From  Ineq.  (34)  we  obtain 


(34) 


h-1 


(^/,m+l  ^i,m  '^rn  '^m)  ^  {h  j  “  7j*_i  —  7rj*_i)  >  {h  —  J*)”,  (35) 

m=7* 


where  the  right-hand  side  term  can  be  expressed  as 


h-\ 


h-l 


^  ^  (^2,m+l  '^m)  ^i,j*  ^  ^  “I"  ^m)- 


m=j‘^ 


m=3^ 


By  combining  Ineq.  (35)  and  Eq.  (36)  we  get 


*^i,h  >  +ih-  j*)—  +  X!  ("Tm  +  7r„^) 

’'*■  m=j* 

If 


h-l 


-  efj*  +  {h-  j*  +  1)—  +  J2{Tm+  TTm). 

T'  i _ ■  * 


7n=j* 


(36) 


(37) 
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But  this  inequality  contradicts  Eq.  (32)  and  therefore  proves  our  statement,  i.e.,  j*  =  h.  Thus, 


From  here  and  from  Eqs.  (1)  and  (2)  we  get 


\,h  ~  '^i.h 


<11  =  4 + (38) 

f  t  •  I 

Now,  from  Eq.  (1)  it  follows  trivially  that 

elj  =  max(dfj_i  +  r^-i  +  +  rj_i  +  j  >  1.  (39) 

By  iterating  over  the  above  equation  and  then  using  Eq.  (2)  we  get 

d"  d"  X]  d"  TTm),  (40) 

■  m=l 

which  together  with  Eq.  (38)  lead  us  to  Eq.  (30). 

This  completes  the  proof  of  the  lemma.  □ 

Lemma  3  The  deadline  of  any  packet  pf,  k  >  1,  at  the  last  hop  h  in  a  Network  of  CJVC  servers 


^i,h  ~  max  -\-  h  d-  TTm),  d'  l^  I  ■ 

\  m=l  / 

Proof.  We  consider  two  cases  whether  Sf  =  0  or  not. 

Case  1.  {Sf  —  0)  From  Eqs.  (2)  and  (6)  it  follows  that 

dih  =  +h^-r  +  Y.('^rn+  T^m)- 

m=l 

On  the  other  hand,  by  the  definition  of  (see  Ineq.  (3)  and  Eq.  (4))  we  have  o’- 
Tj-i  +  TTj.i  +  >  dfj^,  Vj  >  1.  From  here  and  from  Eq.  (2)  we  obtain 

dth  >  d^ifh  d-  -• 

From  this  inequality  and  Eq.  (42),  Eq.  (41)  follows. 

Case  2.  (5f  >  0)  By  using  Eqs.  (2)  and  (10)  we  obtain 


^ti-i  d- 
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(44) 


el+h‘i-  +  {h-l)S^+J^{T„  +  n„) 


Ik  (  jk-1  _  Ik  ik-\ \ 

=  4,  +  '■7  +  ('■  -  i)r‘  +  (*  - 1)^— ^  +  4j'  +  — 

\  Vi  j 

h-\ 

^  >  ij'm  ^m) 

m=l 

=  Cu'+A—  +  ('‘-l)'5f‘‘+i;(T„  +  7r„)  +  ^ 

m=l 

Ik 

=  4^  +  -. 

ri 

Since  5f  >  0,  by  using  again  Eq.  (2)  and  (7)  we  get 

dth  =  ef.i  +  ^-  +  (/i  -  l)(5f  +  ^  (r„,  +  TT^)  (45) 

TO=1 

Ik  h-1 

^  ^/,i  "E  h  ^  ^  ("r^  E  '^m)- 

m=l 

which  together  with  Eq.  (44)  lead  to  Eq.  (41).  □ 

Theorem  1  The  deadlines  of  a  packet  at  the  last  hop  in  a  network  of  CJVC  servers  is  equal  to 
the  deadline  of  the  same  packet  in  a  corresponding  network  of  Jitter- VC  servers. 

Proof.  From  Eqs  (1)  and  (2)  it  is  easy  to  see  that  in  a  network  of  Jitter-VC  servers  we  have 

J^k  h—1 

A,h  =  E  ^  {Tm  E  TTto).  (46) 

m=l 

Similarly,  in  a  network  of  CJVC  servers,  from  Eqs.  (1)  and  (7),  and  by  using  the  fact  that 
Sj  —  0  (see  Eq.  8),  we  obtain  an  identical  expression  for  djj^  (i.e.,  Eq.  (46)). 

Finally,  since  (a)  the  eligible  times  of  all  packets  pf  at  the  first  hop,  i.e.,  ^  {\/k  >  1),  are 

identical  for  both  Jitter-VC  and  CJVC  servers,  and  since  (b)  the  deadlines  of  the  packets  at 
the  last  hop,  i.e.,  df;^  (Vk  >  1),  are  computed  based  on  the  same  formulae  (see  Eqs.  (30),  (41) 
and  46),  it  follows  that  df  j^,  (Vfc  >  1)  are  identical  in  both  a  network  of  Jitter-VC,  and  a  network 
of  CJVC  servers.  □ 


Appendix  C:  Proof  of  Theorem  2 
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To  prove  Theorem  2  (see  Section  3.3)  we  prove  two  intermediate  results:  Lemma  7  which  gives 
the  buffer  occupancy  for  the  case  when  all  flows  have  identical  rates,  and  Lemma  10  which  gives 
the  buffer  occupancy  for  arbitrary  flow  rates. 

Identical  Flow  Rates 

Consider  a  work-conserving  server  with  the  output  rate  one,  which  is  traversed  by  n  flows  with 
identical  reservations  of  1/n.  Assume  that  the  time  axis  is  divided  in  unit  sized  slots,  where 
slot  t  corresponds  to  the  time  interval  [t,t  -h  1).  Assume  that  at  most  one  packet  can  be  sent 
during  each  slot,  i.e.,  the  packet  transmission  time  is  one  time  unit.  Finally,  assume  that  the 
starting  times  of  the  backlogged  periods  of  any  two  flows  are  uncorrelated.  In  practice,  we  enforce 
this  by  delaying  the  first  packet  of  a  backlogged  period  by  an  amount  drawn  from  an  uniform 
distribution  in  the  range  [t arrival,  tarrivai  +  n),  where  tarrivai  is  the  arrival  time  of  the  first  packet 
in  the  backlogged  period.  Note  that  according  to  Eq.  (1),  packets’  eligible  times  during  a  flow 
backlogged  interval  are  periodic  with  period  n.  Thus,  without  loss  of  generality,  we  assume  that 
the  arrival  process  of  any  flow  during  a  backlogged  interval  is  periodic. 

Let  r{t',t'')  denote  the  number  of  packets  received  (i.e.,  became  eligible)  during  the  interval 
and  let  s{t',t")  denote  the  number  of  packets  sent  during  the  same  interval.  Note  that 
and  do  not  include  packets  received/transmitted  during  slot  t".  Let  q{t)  denote 

the  size  of  the  queue  at  the  beginning  of  slot  t.  Then,  if  no  packets  are  dropped,  we  have 

q{t")  =  q{t')  +  rit\n-sit\n  (47) 

Since  at  most  one  packet  is  sent  during  each  time  slot,  we  have  s{t',t'')  <  t"  —  t'.  The  inequality 
holds  when  [t\  t”)  belongs  to  a  server  busy  period.  A  busy  period  is  defined  as  an  interval  during 
which  server’s  queue  is  never  empty.  Also,  note  that  if  t'  is  the  starting  time  of  a  busy  period 
q{t')  =  0. 

The  next  result  shows  that  to  compute  an  upper  bound  for  q{t)  it  is  enough  to  consider  only 
the  scenarios  in  which  all  flows  are  continuously  backlogged. 

Lemma  4  Let  ti  be  an  arbitrary  time  slot  during  a  server  busy  period  that  starts  at  time  to. 
Assume  flow  i  is  not  continuously  backlogged  during  the  interval  [to)ti)-  Then  q{ti)  can  only 
increase  if  flow  i  becomes  continuously  backlogged  during  [foj^i)- 

Proof.  Consider  two  cases  whether  flow  i  is  idle  during  the  entire  interval  [to^^i),  or  not. 

If  flow  i  is  idle  during  [tofli),  consider  the  modified  scenario  in  which  flow  i  becomes  back- 
logged  at  an  arbitrary  time  t  <  to,  and  remains  continuously  backlogged  during  [foj^i)-  In 


43 


addition,  assume  that  the  arrival  patterns  of  all  the  other  flows  remain  unchanged.  As  a  re¬ 
sult,  it  is  easy  to  see  that  in  the  modified  scenario  the  total  number  of  packets  received  during 
[to,  ^i)  can  only  increase,  while  the  starting  time  of  the  busy  interval  can  only  decrease.  Let  r', 
5',  and  q'  denote  the  corresponding  values  in  the  modified  scenario.  Then  q'{to)  >  q{to)  =  0, 
r'(to,ti)  >  r{to,ti),  and  5'(to,ti)  =  s{to,ti)  =  ti  —  to.  From  Eq.  (47)  it  follows  then  that 
q'ih)  >  qih). 

In  the  second  case,  when  flow  i  is  neither  idle  nor  continuously  backlogged  during  the  interval 
[to,  ti),  let  t'  denote  the  time  when  the  last  packet  of  flow  i  arrives  during  [fo,  ti).  Next  consider  the 
modified  scenario  in  which  flow  Fs  packets  arrive  at  times:  t'  -  na, . . . ,  -  n,  t',  F  +  «,  •  •  • ,  t'  -|-  n6, 

such  that  t'  —  na  <  to,  and  ti  <  t'  +  nb.  It  is  easy  to  see  then  that  the  number  of  packets  of  flow 
i  that  arrive  during  [to,  ti)  is  no  smaller  than  the  number  of  packets  of  flow  i  that  arrive  during 
the  same  interval  in  the  original  scenario.  By  assuming  that  the  arrival  patterns  of  all  the  other 
flows  do  not  change,  it  follows  that  r'(to,ti)  >  r(to,ti).  In  addition,  since  at  most  h-to  packets 
are  transmitted  during  [to,ti)  we  have  s'{to,ti)  <  ti  -  to.  The  inequality  holds  if,  after  changing 
the  arrival  pattern  of  flow  i,  the  server  is  no  longer  busy  during  the  entire  interval  [to,  ti).  In 
addition,  we  have  q'{to)  >  0,  and  from  the  hypothesis  q{to)  =  0.  Finally,  from  Eq.  (47)  we  obtain 
q'iti)  >  q{ti),  which  concludes  the  proof  of  the  lemma.  □ 

In  consequence  in  the  remaining  of  this  section  we  limit  our  study  to  a  busy  period  in  which 
all  flows  are  continuously  backlogged. 

Let  ti  be  the  time  when  the  last  flow  becomes  backlogged.  Let  to  be  the  latest  time  no  larger 
than  ti  when  the  server  become  busy,  i.e.,  it  has  no  packet  to  send  during  [to  -  l,to)  and  is 
continuously  busy  during  the  interval  [to,ti  +  1).  Then  we  have  the  following  result. 

Lemma  5  If  all  flows  remain  continuously  backlogged  after  time  ti,  the  server  is  busy  for  any 
time  t>  to. 

Proof.  By  the  definition  of  to,  the  server  is  busy  during  [to,  ti).  Next  we  show  that  the  server  is 
also  busy  for  any  tj  >  0. 

Consider  a  flow  that  becomes  backlogged  at  time  t'.  Since  its  arrival  process  is  periodic 
it  follows  that  during  any  interval  [t'  -  n  -f  i,f  +  i),  'ii  >  0,  exactly  one  packet  of  this  flow 
arrives.  Since  after  time  ti  all  n  flows  are  backlogged,  exactly  n  packets  are  received  during 
[ti  —  n  z,ti  -f  i),  Vi  >  0.  Since  at  most  n  packets  are  sent  during  each  of  these  intervals  it 
follows  that  the  server  cannot  be  idle  during  any  slot  i.U 

Consider  a  buffer  of  size  s.  Our  goal  is  to  compute  the  probability  with  which  the  buffer 
overflows  during  an  arbitrary  interval  [fo,^o  +  d).  From  Lemma  5  it  follows  that  since  the  server 
is  busy  during  [to,^o  +  d),  exactly  d  packets  are  transmitted  during  this  interval.  In  addition, 
since  the  starting  times  of  flows’  backlogged  periods  are  not  correlated,  in  the  followings  we  also 
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assume  that  the  starting  times  of  a  flow’s  backlogged  period  is  not  correlated  with  the  starting 
time,  to,  of  a  busy  period.  Thus,  during  the  interval  [to,  to  +  d),  a.  flow  receives  \d/n]  packets 
with  probability  p(d)  =  dju—  [d/nj,  and  [d/nj  with  probability  1  —  p.  Since  this  probability  is 
periodic  with  period  n  it  will  suffice  to  consider  only  intervals  of  size  at  most  n.  Consequently, 
in  the  followings  we  assume  d  <  n.  The  probability  to  receive  one  packet  during  [to,  to  +  d)  is 
then 

p(d)  =  -.  (48) 

n 

Let  p(m;  d)  denote  the  probability  with  which  exactly  m  packets  are  received  during  the  time 
interval  [to,  to  +  d),  where 

p(m;<i)=  («) 

Now,  let  P{x  >  s,  u)  denote  the  probability  with  which  the  queue  size  exceeds  s  at  time  to  + w. 
Since  the  server  is  idle  at  to  and  busy  during  [to,  to  +  u),  from  Eq.  (47)  follows  that  the  server’s 
queue  overflows  when  more  than  u  s  packets  are  received  during  [to,  to  +  u).  Thus,  we  have 

P{x>s,u)=  p{r,u)=  (50) 

The  next  result  computes  P{x  >  s,u). 


Lemma  6  The  probability  that  a  queue  of  size  s  overflows  at  time  to  +  u  is  bounded  by 


P{x  >  s,u)  <  fl{n) 


1  /l  —  (s  —  l)/2n\^*  (n  +  s) 


27r  \1  +  (s  —  l)/2n^ 


Asn 


where  /3(n)  = 

Proof.  From  Eq.  (49)  we  obtain 

,  ,  .  p{u)  n  —  m  .  . 

pm  +  1;  u)  =  ■■  '  - —  •  pm;  u), 

1  —  p{u)  m.  +  1 

By  plugging  the  above  equation  and  Eq.  (48)  into  Eq.  (50)  we  obtain 


i=ti+s4.1  \k=u-\-s 
n  /  t— 1 


<  Li. 


=  p{u  +  s-,u)  ^ 


n  —  u  —  s 


u 


u  +  s  n  —u 


.n 

i—u~s 


(51) 


(52) 


(53) 
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Next,  it  can  be  easily  verified  that  for  any  positive  reals  a,  b,  and  x,  such  that  6  —  a;  >  0,  we  have 

a  b  —  X  /a  +  b  —  x\^ 
a  +  x  b  + 

By  taking  a  =  u,  b  =  n  -  u,  x  =  s,  Eq.  (53)  becomes 


^  /ri  e\  2(«— w— 5)  00 

P(a;>s,r/)  <  p{u  +  s-,u)  ^  (^~)  <p(«  +  ^;«)S 

i=«+5+l  +  -S/  j_Q 


^  /n  —  s\ 


??  +  s 


<  p(u  +  s;u) 


(n  +  s)^ 


Next,  it  remains  to  bound  p{u  +  From  Eqs.  (48)  and  (52)  we  he 


Pin +s-,u)= Pin-,  t.)  n  ( — •  )  <  n  (  ^  • 

i=o  ~  u  +  ^  +  1  /  i=o  *  n  —  u 
By  using  Ineq.  (54)  with  a  =  u,  b  =  n  -  u,  and  x  =  i,  we  obtain 


u  n  —  u  —  i 


W  fn-i\‘^ 


p{u  +  s;  w)  =  p(u;  tt)  JJ  — — 

r=o  +  * 


/  \%-T  —  i  n  —  s  +  l+i 

piu\u)  - :  - - 

,io  *  n  +  s  -  1  -  i 


Again,  by  applying  Ineq.  (54)  to  the  pairs  (n  -  i)/{n  +  i)  and  (n  -  s  +  1  +  i)/(n  +  s  -  1  -  f), 
Vi  <  s/2,  we  have 

'’(“+*’“)<'’'“'”)(lrT(^)  =y(“’")G+|'l!v2n)  •  (58) 

To  bound  p(u;  u)  we  use  Stirling  inequalities  [8],  i.e.,  \/27rn(n/e)"  <  n!  <  \/27rn(n/e)"+(Vi2n)^ 
V77  >  1.  From  here  we  have 


\/27rn(n/e)”'''(Vi2n) 

\/27ru(u/e)“y^27r(n  —  •u)((n  —  w)/e)"“'^ 
I  n  n”(n/e)^A2n 

y  27r(n  —  u)u  u^{n  —  77)"““  ‘ 


By  combining  Eqs.  (48),  (49)  and  (59),  we  obtain 

P'”'  “>  ^  2  (’(")\/^-  (8») 

where  /?(n)  =  (n/e)^+(Vi2n)  jgg^  inequality  follows  from  the  fact  that  n/((n  —  u)u)  <  1, 

for  any  u  >  1,  n  >  2.  By  plugging  the  above  result  in  Eq.  (55)  we  obtain 
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P{x  >  s^u)  <  f3{n) 


1  / 1  —  (s  —  l)/2n'\  *  (n  +  5) 


27r  \1  +  (s  -  l)/2n/  Asn 


Lemma  7  Consider  n  flows  with  identical  rates  and  unit  packet  sizes.  Then  given  a  buffer  of 


( In  n  In  s 


the  probability  that  the  buffer  overflows 
asymptotically  <  e. 


during  an  arbitrary  time  slot  when  the  server  is  busy  is 


Proof.  To  compute  the  asymptotic  bound  for  P{x  >  s.,u)  assume  that  s  n.  Since  (1  — 
x)/(l  +  a;)  ~  1  —  2a;  and  ln(l  —  a;)  ~  a;,  for  a;  -)•  0,  and  since  (n  +  sfl/sn  <  n  for  n  >  s  >  4,  and 
fl{n)  <  1.102  for  any  n  >  1,  by  using  Eq.  (51)  we  obtain'’’ 


lnP(a;>s,u)  ~  In  Un)^\  +  2.  •  In  +  Inn  -  In  4 


~  \n(  fl{n)J^\  +2s -In  (l-- - ^^+lnn-ln4 


~  In  y3{n)\J —  2s^  - +lnra  — ln4 

<  —2  —  2^^^ — ^  +  Inn  ~  2{— 1  —  — )  +  Inn. 
n  n 


Using  e  to  bound  P{x  >  s,u)  leads  us  to 


P{x  >  s-,u)  <  e 


2  ^—1 - j  +  Inn  <  Ine  ^ 

/  In  n  In  e  \ 


^More  precisely  \nl3(n)y/l/{2ir)  —  ln4  <  —2.2081062  — 
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Next  we  prove  a  stronger  result  by  computing  an  asymptotic  upper  bound  for  the  probability 
with  which  a  queue  of  size  s  overflows  during  an  arbitrary  busy  interval.  Let  Q{x  >  5)  denote 
this  probability.  The  key  observation  is  that  since  all  flows  have  period  n,  the  aggregate  arrival 
traffic  will  have  the  same  period  n.  In  addition,  since  during  each  of  these  periods  exactly  n 
packets  are  received/transmitted  it  follows  that  the  queue  size  at  any  time  to  +  *  •  n  +  j  is  the 
same,  Vi,  j  >  0.  Consequently,  if  the  queue  does  not  overflow  during  [to,  to  +  n),  the  queue  will 
not  overflow  at  any  other  time  t  '>  ti  during  the  same  busy  period.  Thus,  the  problem  reduces 
to  compute  the  probability  of  queue  overflowing  during  the  interval  [to,^o  +  n).  Then  we  have 
the  following  result. 

Corollary  1  Consider  n  flows  with  identical  rates  and  unit  packet  sizes.  Then  given  a  buffer  of 
size  s,  were 

s  >  y/n{lnn  -  {lne)/2  -  1),  (65) 

the  probability  that  the  buffer  overflows  during  an  arbitrary  busy  interval  is  asymptotically  <  e. 

Proof.  Let  e'  be  the  probability  that  a  butfer  of  size  s  overflows  at  an  instant  t  during  the  busy 
interval  Then  the  probability  that  the  buffer  overflows  during  this  interval  is  smaller 

than  1  —  (1  —  e'ff  <  u  ■  e'.  Now,  recall  that  if  the  buffer  does  not  overflow  during  [to,  to  +  n),  the 
buffer  will  not  overflow  after  time  to  -\-  n.  Thus  the  probability  that  the  buffer  will  not  overflow 
during  an  arbitrary  busy  period  is  less  than  ne\  Finally,  let  e  =  n  •  e',  and  apply  the  result  of 
Lemma  7  for  e',  i.e.. 


.  >  ^ 

□ 


^yinn  ln(e/n)  \  ^ 


1 


(66) 
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Arbitrary  Flow  Rates 

In  this  section  we  determine  the  buffer  bound  for  a  system  in  which  packets  are  of  unit  size, 
but  the  reservations  can  be  arbitrary.  The  basic  idea  is  to  use  a  succession  of  transformations  to 
reduce  the  problem  to  the  case  in  which  the  probabilities  associated  to  the  flows  can  take  at  most 
three  distinct  values,  and  then  to  apply  the  results  from  the  previous  case  when  all  reservations 
are  assumed  to  be  identical. 

Consider  n  flows,  and  let  denote  the  rate  reserved  by  flow  k,  where 

'tn  =  i-  '  (67) 

A;— 1 

Consider  again  the  case  when  all  flows  are  continuously  backlogged.  Let  to  denote  the  start¬ 
ing  time  of  a  busy  period.  Since  the  time  when  flow  k  becomes  backlogged  is  assumed  to  be 
independent  of  to,  it  follows  that  during  the  interval  [to,  U  +  d)  flow  k  receives  exactly  [d  •  rfcj  +  1 
packets  with  probability  , 

Pkid)  =  d-rk-[d-  TfcJ ,  (68) 

and  [d  •  rk\  packets  with  probability  1  —  Pk{d). 

Let  p{m;d)  denote  the  probability  with  which  the  server  receives  exactly  Ylk=i[d  ' 
packets  during  the  interval  [<o,to  +  d).  Then 

p(m;d)  =  7r(pa(d),p2(d),...,p„(d)),  (69) 

where  T^{pi{d),p2{d), . ..  ,Pn{d))  is  the  coefficient  of  x'^  in  the  expansion  of 

f{{xpi{d)  +  (1  -  Pi{d))).  (70) 

i=  1 

Note  that  when  all  flows  have  equal  reservations,  i.e.,  Vk  =  1/n,  1  <  k  <  n,  Eq.  (69)  reduces  to 
Eq.  (49). 

By  using  Eq.  (68)  the  number  of  packets  received  during  [to,  to  +  d)  can  be  written  as 

'^[d  ■  rk\  +  m  =  '^(d  •  Vk  -  Pk{d))  +  m  =  d  -  ^ pk{d)  +  m.  (71) 

k=l  k=l  k-l 

Since  to  is  the  starting  time  of  the  busy  period  and  since  the  server  remains  busy  during  [to,  to+d), 
from  Eq.  (47)  it  follows  that  q{to  +  d)  =  m  —  J2k=i  Pk{d). 

Similarly,  the  probability  P{x  >  s,u)  to  overflow  a  queue  of  size  s  at  time  to  +  u  is 

P{x>s,u)=  ^  p(i;u),  (72) 
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where  v  =  Y2=i  Pk{u)  +  s. 

Since  in  the  followings  pk{u)  is  always  defined  over  [^0,^0  +  ^^)  we  will  drop  the  argument  from 
the  pt(t<)’s  notation.  Next,  note  that  for  any  two  flows  k  and  /,  u)  can  be  rewritten  as 

pirn-  u)  =  pkPtAk,i{m)  +  (^^.(1  -  pi)  +  (1  -  pk)pi)Bk,iim)  +  (1  -  pfe)(l  -  pi)Ck,i{m),  (73) 

where  pkPiAk,i,  represents  all  terms  in  r™(pi,p2,  •  •  •  ,Pn)  that  contain  pkPi,  ipk{l  -  pi)  +  (1  - 
Pk)pi)Bk,i  represents  all  terms  that  contain  either  pk{l-pi)  or  {l-pk)pi,  and  (I  -  Pk)i'i-  -  Pi)Ck.i 
represents  all  terms  that  contain  (1  —  pfc)(l  —  pi). 

From  Eqs.  (72)  and  (73),  the  probability  to  overflow  a  queue  of  size  5  at  time  to  +  ii  is  then 

n 

Pix>s,u)  =  X]  (74) 

=  PkPi  •  Ak,i{v,  n)  +  {pk{l  -  pi)  +  (1  -  pk)pi)  -  Bk,i{v,  n)  + 
{1-Pk)il-Pi)-Ck,iiv,n), 

where  Ak,iiv,n)  =  EF^^+i  Bk,iiv,n)  =  EL^+i  ^^(0,  and  CkA^^n)  =  E"=,.+i  C'fc,/(0, 

respectively. 

Our  next  goal  is  to  reduce  the  problem  of  bounding  P{x  >  s,  u)  to  the  case  in  which  the 
flows’  probabilities  take  a  limited  number  of  values.  This  makes  possible  to  use  the  results  from 
the  homogeneous  reservations  case  without  compromising  too  much  the  bound  qualit}^  The  idea 
is  to  iteratively  modify  the  values  of  the  flows’  probabilities,  without  decreasing  P(x  >  s,«).  In 
particular,  we  consider  the  following  simple  transformation:  select  two  probabilities  pk  and  pi 
and  update  them  as  follows: 

p'k  =  Pk-S,  (75) 

p'l  =  Pi-^S, 

where  5  is  a  real  value  such  that  0  <  p'i,p'^  <  1,  and  the  new  computed  probability 

P'ix  >  s,u)  =  p'kp'i  •  Ak,iiv,  n)  +  (p'j^il  -  p'l)  +  (1  -  p',^)p'i) .  Bk,i{v,  n)  +  (76) 

(1  -K)(l  -;>/)  •CA.,i(«^>«)- 

is  greater  or  equal  to  P(x  >  s,u). 

It  is  interesting  note  that  performing  transformation  (75)  is  equivalent  to  defining  a  new 
system  in  which  the  reservations  of  flows  k  and  /  are  changed  to  and  rj,  respectively,  such  that 
Pk  =  d-  r'k  ~  ■  ^1- J 5  p'l  =  d  •  r'l  —  [d •  r'i\.  There  are  two  observations  worth  noting  about  this 

system.  First,  by  choosing  r'k  =  Vk  —  5/d  and  =  riAS/d  we  maintain  the  invariant  E”=i  =  1- 
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Second,  while  in  the  new  system  the  start  time  to  of  the  busy  period  may  change,  this  will  not 
influence  P'{x  >  s,u)  as  this  depends  only  on  the  length  of  the  interval  [to,  to  +  u). 

Next,  we  give  the  details  of  our  transformation.  From  Eqs.  (74),  (75)  and  (76),  after  some 
simple  algebra,  we  obtain 

P'{x  >  s,u)  -  P{x  >  s,u)  =  5{pk  -  Pi  -  S)Vk,i{v,  n),  (77) 

where 

Recall  that  our  goal  is  to  chose  5  such  that  P'{x  >  s,u)  >  P{x  >  s).  Without  loss  of 
generality  assume  that  pk  >  pi-  We  consider  two  cases:  (1)  if  'Dk,i{v,n)  >  0,  then  ^  >  0  and 
Pk  >  Pi  +  ^  ^  and  Pk  <  Pi ^  cannot  be  simultaneously  true);  (2)  if  'Dk,i{v,n)  <  0,  then 

either  (5  >  0  and  pk  <  p/  +  5,  or  ^  <  0  and  pk  >  pi  +  5. 

Let  pmin  =  mini<K„p„  and  p^ax  =  maxi<i<„pi,  respectively.  Consider  the  following  three 
subsets,  denoted  U,  V,  and  M,  where  U  contains  all  flows  k  such  that  pk  —  Pmin,  V  contains  all 
flows  k  such  that  pk  =  Pmax,  and  M  contains  all  the  other  flows.  The  idea  is  then  to  successively 
apply  the  transformation  (75)  on  pi,p2,  ,Pn,  until  the  probabilities  of  all  flows  in  M  become 
equal.  In  this  way  we  reduce  the  problem  to  the  case  in  which  the  probabilities  pk  can  take 
at  most  three  distinct  values:  Pmin,  Pmax,  and  pM,  where  pk  =  Pm,  VA;  €  M.  Figure  15  shows 
the  iterative  algorithm  to  achieve  this.  Lemmas  8  and  9  prove  that  by  using  the  algorithm  in 
Figure  lb,  pi,p2,  ■■■  ,Pn  converge  asymptotically  to  the  three  values. 

while  (|M|  >  1)  do  /*  while  size  of  M  is  greater  than  one  */ 

Pi  =  minigM(Pi); 

Pk  =  maxigM(Pi); 

if  {Dk,iiv,n)  >  0) 

Pk  =  pi-  [pk  +  pO/2; 

else 

^  —  niax(pfc  Pmax,  Pmin  Pi)  i 
Pk  =  Pk  -  S]  pi^pi  +  5- 

{Pl  ~  Pmin) 

M  =  M\{iy,U  =  U(J{1}; 

if  {pk  =  Pmax) 

M  =  M\{ky,V  =  Vu{k}; 

Figure  15:  Reducing  pi,p2,  ■■■  Pn  to  three  distinct  values. 
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Lemma  8  After  an  iteration  of  the  algorithm  in  Figure  15  either  the  size  of  M  decreases  by  one, 
or  the  standard  deviation  of  the  probabilities  in  M  decreases  by  a  factor  of  at  least  (1  —  jjlji)- 


Proof.  The  first  part  is  trivial;  if  Dk,i{v,n)  <  0  the  size  of  M  decreases  by  one.  For  the  second 
part,  let  p  denote  the  average  values  of  probabilities  associated  to  the  flows  in  M,  i.e., 


The  standard  deviation  associated  to  the  probabilities  in  M  is 


dev  =  (Pi  -  pY-  (80) 

s'eM 

After  averaging  probabilities  pk  and  pi,  standard  deviation  v  changes  to 

dev'  ^  dev  +  2  -  p)  -  {Pk  -  pf  -  {pi  -  pf  =  dev  -  (81) 

Since  pa:  and  pi  are  the  lowest,  and  respectively,  the  highest  probabilities  in  M  we  have  {pi -p)^  < 
{pt  —PkY,  Vi  G  M.  From  here  and  from  Eqs.  (80)  and  (81)  we  have 

dev  =  (Pi  -  pf  <  \M\{pi  -  pkf  =  2\M\{dev  -  dev')  =4>  dev'  <  devil-  777^  )  .  (82) 

ieM  \  2|M|y 

□ 


Lemma  9  Consider  n  flows,  and  let  pi  denote  the  probability  associated  to  flow  i.  Then,  by  using 
the  algorithm  in  Figure  15,  the  probabilities  p,  (I  <i  <n)  converge  to  at  most  three  values. 

Proof.  Let  e  be  an  arbitrary  small  real.  The  idea  is  then  to  show  that  after  a  finite  number  of 
iterations  of  the  algorithm  in  Figure  15,  the  standard  deviation  of  pfs  (i  e  M)  becomes  smaller 
than  s. 

The  standard  deviation  for  the  probabilities  of  flows  in  M  is  trivially  bounded  as  follows 


dev  p)  ^  ^f^iPmax  Pmin)  —  \AI\{jpmax  ~  Pminf  <  f^iPmax  ~  Pmin)^  •  (83) 

ieM  ieM 


Assume  Dkflv,n)  >  0  (i.e.,  M  does  not  change)  for  ni  consecutive  iterations.  Then,  by  using 
Lemma  8,  it  is  easy  to  see  that  ni  is  bounded  above  by  N,  where 


N 

<  dev  • 


(i  - — r = £  ^  iv = ... 

V  2nJ  lii(l-l/(2n))' 


(84) 
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Since  the  above  bound,  N,  holds  for  any  set  M,  it  follows  that  after  nN  iterations,  we  are 
guaranteed  that  either  set  M  becomes  empty,  case  in  which  lemma  is  trivially  true,  or  dev  <  e. 
□ 

Thus,  we  have  reduced  the  problem  to  compute  an  upper  bound  for  probability  P{x  >  s,u) 
in  a  system  in  which  probabilities  take  only  three  values  at  time  u:  Pmin-,  Pmax,  and  pM- 
Next  we  give  the  main  result  of  this  section 

Lemma  10  Consider  n  flows  with  unit  packet  sizes  and  arbitrary  flow  reservations.  Then  given 
a  buffer  of  size  s,  were 


the  probability  that  the  buffer  overflows  in  an  arbitrary  time  slot  during  a  server  busy  period  is 
asymptotically  <  e. 

Proof.  Consider  the  probability,  P{x  >  s,u),  with  which  the  queue  overflows  at  time  to +  «  (see 
Eq.  (72)).  Next,  by  using  the  algorithm  in  Figure  15,  we  reduce  probabilities  pfs  (1  <  «  <  n) 
to  three  values:  Pmin.Pmax,  and  pM,  respectively.  Let  p{  denote  the  final  probability  of  flow  i, 
and  let  P^ {x  >  s,  u)  denote  the  final  probability  of  the  queue  overflowing  at  time  U  +  u.  More 
precisely,  from  Eqs.  (72)  and  (69)  we  have 

P^{x>S,u)=  P^ii',u)=  ^2  Tn{pi,P2,...,Pn),  (86) 

i=v+l 

where  v  =  YX.=\  Pk{'^)  +  Pi  ~  Pmin-,  'ii  pi  =  Pmax:  Vi  G  V,  and  pi  =  pM-,  Vi  G  V .  Since 

after  each  transformation  P{x  >  s,u)  can  only  increase,  we  have  P^{x  >  s,u)  >  P{x  >  s,u). 

Let  nt7,  ny,  and  nM  be  the  number  of  flows  in  sets  [/,  V,  and  M,  respectively.  Define  integers 
Uu,  vv,  and  um?  such  that  v  =  Vy,  vy  vm-,  and  vjj  <  nu,  vy  <  ny,  and  vm  <  respectively. 
Then,  it  can  be  shown  that 

P-f{x>s,u)  <  Pu  +  Pv  +  Pm,  (87) 

where 

ft,  =  E  (88) 

i=zvu  +  l  \  ^  / 

Pv  =  Y.  {’'J)pLAI  - 

Pm  =  Y  (j)lfM(l-PMr“^‘- 

i=VM+l  V  *  / 
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Due  to  the  notation  complexity  we  omit  the  derivation  of  Ineq.  (87).  Instead,  below  we  give  an 
alternate  method  that  achieves  the  same  result. 

The  key  observation  is  that  Pu  represents  the  probability  with  which  more  than  ■ 

i'i\  +  vu  packets  from  flows  in  U  arrive  during  the  interval  [to,^o  +  u).  This  is  easy  to  see  as 
the  probability  that  exactly  J2i£u\.'^  •  rjj  +  m  packets  from  flows  in  U  arrive  during  [to, to  +  u)  is 
—  PminY^~'^  (see  Eq.  (69)  for  comparison). 

Similarly,  Py  is  the  probability  that  more  than  +  vy  packets  from  flows  in  V 

arrive  during  [fo, fo  +  u),  while  Pm  is  the  probability  that  more  than  YlieMb^'' '  dJ  +  vm  packets 
from  flows  in  M  arrive  during  the  same  interval. 

Consequently,  (1  -  Pi7)(l  -  Pv^)(l  —  Pm)  represents  the  probability  with  which  no  more  than 

Pvy,  and  +^m  packets  are  received  from  flows  in  U,  V, 

and  M  during  [to,  to  +  u).  Clearly  this  probability  is  no  larger  than  the  probability  of  receiving 
no  more  than  [u  •  rjJ  +  v  packets  from  all  flows  during  the  interval  +  u),  probability 
which  is  exactly  1  —  P^{x  >  s,u).  This  yields 

1  —  P-^{x  >  s,  u)  >  (1  —  Pu){l  —  Pv){l  —  Pm)  (89) 

P^{x>s,u)  <  l-il-Pu){l-Pv)il-PM)<Pu  +  Pv  +  PM- 

Next,  consider  the  expression  of  Pu  in  Eq.  (88).  Let 


su  =  vu  -  uu,  (90) 

where  uu  =  Pmin'nu-  Then  it  is  easy  to  see  that  the  expressions  of  pmin  (i-e.,  Pmin  =  uul^u)  and 
Pu,  given  by  Eq.  (88),  are  identical  to  the  expressions  of  p{d)  and  P{x  >  s,  u),  given  by  Eqs.  (48) 
and  (50),  respectively,  after  the  following  substitutions:  d  -ir-  uu,  n  <—  nu,  u  <-  uu,  s  4-  su-  By 
applying  the  result  of  Lemma  6  we  have  the  following  bound 


E  (''^)pinin{^-Pmin)^^-^ 

/T /l-(.„-l)/2ni,y“  («;;  +  3^,)^ 

27r  p  +  {sv  -  l)/2nv  )  4su>H, 


Next  we  compute  su,  such  that 


(91) 


I  =  ( 

By  applying  the  same  approximations  used  in  proving  Lemma  7  (see  Ineq.  (63)),  i.e.,  su  <  nu, 
■Sy  <C  ny,  and  sm  riM,  respectively,  we  get 


1  -  (<Si7  -  l)l2nu'\^^^  {nu  +  su)"^ 
1  +  {su  —  l)/2nu  j  AsuUu 
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and  similarly 


By  using  the  above  values  for  su,  sy,  and  sm,  respectively,  and  by  the  definition  of  P^{x  > 
s,u)  and  Ineq.  (87),  we  have 


P{x  >  s,u)  <  P^{x  >  s^u)  <  Pu  +  Pv  Pm  <  3 


Now  it  remains  to  compute  s.  First,  recall  that  su  =  vu  —  uu,  sy  =  vu  —  uy,  sm  —  —  um-, 

where  uu  =  PminU,  uy  =  PmaxU,  and  um  =  (see  Eq.  (90)).  From  here  we  obtain 


■5(7  +  -sv  +  Sm  =  —  nu)  +  {vv  —  »^v)  +  (^M  ~  ^m)  (96) 

=  V  —  nu  —  Tiy  —  Um  ~  3 

=  V  —  Pruin^U  -  Pmax^V  “  PM^M 

—  V  ^  ^  Pmin  ^  ]  Pmax  ^  ^  PM 
ieu  iev  i£M 

n 

=  v-Y,Pi==^- 

i=l 

As  both  P{x  >  s,u)  and  P7(x  >  s,u)  decrease  in  s,  for  our  purpose  it  is  sufficient  to 
determine  an  upper  bound  for  $.  From  Eqs.  (93),  (94)  and  (96)  this  reduces  to  compute 

(97) 

subject  to  nu  +  'ny  +  um  =  Since  the  function  \/«Tna;  is  concave,  it  follows  that  expression 
(97)  achieves  maximum  for  nu  =  ny  =  nM  —  njZ.  Finally,  we  choose 

which  completes  the  proof.  □ 

By  combining  Lemmas  7  and  10  we  have  the  following  result 
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Theorem  2  Consider  a  server  traversed  by  n  flows.  Assum  e  that  the  arrival  times  of  the  packets 
from  different  flows  are  independent,  and  that  all  packets  have  the  same  size.  Then,  for  any  given 
probability  e,  the  queue  size  at  any  time  instant  during  a  server  busy  periodic  is  asymptotically 
bounded  above  by  Sj  where 

s  = 

with  a  probability  larger  than  1—6.  For  identical  reservations  /3  —  1/  for  heterogeneous  reserva¬ 
tions  (3  =  Z. 


Appendix  D:  Proof  of  Theorem  3 


Theorem  3  Consider  a  link  of  capacity  C  at  time  t.  Assume  that  no  reservation  terminates 
and  there  are  no  reservation  failures  or  request  losses  after  time  t.  Then  if  there  is  sufficient 
demand  after  t  the  link  utilization  approaches  asymptotically  (7(1  —  /)/(!  +  /). 


Proof.  If  the  aggregate  reservation  at  time  t  is  larger  than  ^(1  -  /)/(!  +  /),  the  proof  is  trivially 
true.  Next,  we  consider  the  case  in  which  the  aggregate  reservation  is  less  than  (7(1  —  /) /(I  +  /). 

In  particular,  let  (7(1  —  /)/ (1  H-  /)  —  A  be  the  aggregate  reservation  at  time  t.  Without  loss 
of  generality  assume  t  =  Uk.  Then  we  will  show  that  if  no  reservation  terminates,  no  reservation 
request  fails,  and  it  is  enough  demand  after  time  Uk,  then  at  least  (1  +  /)A/2  bandwidth  is 
allocated  during  the  next  two  slots,  i.e.,  during  the  interval  {uk,Uk+2]-  Thus,  for  any  arbitrary 
small  real  £,  we  are  guaranteed  that  after  at  most 


ln(e/A) 

ln((l-.f)/2) 


(100) 


slots  the  aggregate  reservation  will  exceed  (7(1  —  /)/(!  +/)—£. 

From  Eq.  (20)  it  follows  that  the  maximum  capacity  which  can  be  allocate  during  the  interval 
(uk,Uk+i]  is  max((7  -  Rcai{uk),0).  Assume  then  that  Ai  capacity  is  allocated  during  (u/,,,  Ujt+i], 
where  Ai  <  max((7  —  Rcai{uk),0).  Consider  two  cases  whether  Ai  >  A  or  not.  If  Ai  >  A  the 
proof  follows  trivially. 

Assume  Ai  <  A.  Then  we  will  show  that  at  time  Uk+2  the  aggregate  reservation  can  increase 
by  at  least  a  constant  fraction  of  A.  From  Figure  16  is  easy  to  see  that  for  any  reservation 
continuously  active  during  an  interval  {uk,Uk+i]  we  have 


bi(uk,Uk+i)  <  riiTw  +  T/  +  Tj). 


(101) 
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Figure  16:  The  scenario  in  which  the  upper  bound  of  bi,  i.e.,  ri{Tw  —  Ti  —  Tj),  is  achieved.  The  arrows 
represent  packet  transmissions.  Tw  is  the  averaging  window  size;  Tj  is  an  upper  bound  on  the  packet 
inter-departure  time;  Tj  is  an  upper  bound  on  the  delay  jitter.  Both  ml  and  m2  fall  just  inside  the 
estimation  interval,  Tw,  at  the  core  node. 


Since  no  reservation  terminates  during  {uk,Uk+i\  we  have  C{uk+i)  =  C{uk)  U  Let 

aci  €  {uk,Uk+i]  be  the  time  when  flow  i  becomes  active  during  (ufc,Ufc+i]-  Since  bi{aci,Uk+i)  < 
bi{uk,Uk+i),  by  using  Eq.  (101)  we  obtain 


B{uk,Uk+i) 
From  here  we  get 


bi{uk,Uk+i)  <  ri{Tw Tj  +  Tw)- 


(102) 


RDPs{uk,Uk+l)  <  R{uk+i){l  +  f).  (103) 

Since  there  are  no  duplicate  requests  or  partial  reservation  failures  after  time  t  =  Uk,  have 
Ai  =  Rnewi^k+i)-  From  here  and  from  Eq.  (20)  and  Ineq.  (103)  we  have 

+  A.  <  +  Ai.  (104) 

In  addition,  we  have  R{uk+i)  =  R{uk)-{-  Ai.  Since  R{uk)  =  (7(1  —  /)/(!  -f /)  —  A,  from  Eq.  (104) 
it  follows 


(7  -  Rcaiiuk+i)  >  C-  R{uk+i)^^  -  Ai  >  ^A  -  (105) 

Finally,  consider  two  cases  whether  (a)  Ai  <  A(1  -f  /)/2,  or  (b)  not.  If  (a)  is  true  then  the 
link  can  allocate  up  to 

A,  +  C  -  R„,{um)  >  a,  +  =  i4y(A  -  A,)  >  li^A.  (106) 

capacity  during  the  time  interval  (uk,Uk+2]-  In  case  (b)  we  have  trivially  Ai  >  A(1  d-/)/2.  Thus 
in  both  cases  we  can  allocate  at  least  A(1  +  f)/2  new  capacity  during  {uk,Uk+2]-  □ 


57 


